Identification of Quantitative Trait Loci (QTL) for Awn, Incomplete Panicle Exertion and Total Spikelet Number in an F2 Population Derived from A Backcross Inbred Line, Bio-148, and the Recurrent Parent, IR64

An F2 rice population developed from a cross betwee n a backcross inbred line (BIO-148) and its recurre nt parent (IR64) was used to identify quantitative trait loci (QTL) for awn, panicle exertion and total spikelet number. BIO-148 is a BC2F8 line derived from a cross between IR64 (a h igh-yielding lowland rice variety) and Gajah Mungku r (an upland tropical japonica rice variety). Two hundred plants were grown in the greenhouse, and their DNAs were isolated for genotyping using SSR markers. Panicle exertion was observed during the grain-filling stage. The awn le ngth of the seed and the total spikelet number per panicle were obse rved after harvesting. A total of four QTLs were id entified using single-marker regression with LOD>3, explaining 8.4 -18.1% of phenotypic variation. A QTL for awn was i dentified on Chromosome 8. A QTL for incomplete panicle exertion was identified on Chromosome 4. Two QTLs for total spikelet number were identified on Chromosome 4, in which th e BIO-148 allele contributed to a higher number of spikelets per panicle. The QTLs identified in this study will be useful in the improvement of yield potential for mo dern lowland indica rice varieties by harnessing the hidden usef ul alleles from upland tropical japonica rice varie ties.


Introduction
Rice (Oryza sativa), as the most important crop in the world and a model plant species [1], has undergone rapid development in research breeding programs over the last decade. Rice is the staple food in many parts of the world and for more than half of the world's population [2,3]. The human population is projected to reach 9 billion people by 2050 [4], which will result in an increasing demand for rice.
Increasing rice yield potential has been the main breeding objective in rice improvement programs [5,6] because there has been a stagnation in productivity due to the narrowing genetic diversity of rice [2]. Efforts to create higher-yield-potential varieties are incessantly undertaken because average farm yield is largely influenced by the yield potential of crop varieties [7].
According to Evans (1993) [8], a yield potential is the yield of a variety that is growing in its adapted environment under optimum conditions and with all stresses effectively controlled. An increase in yield potential was achieved in the late 1950s and early 1960s via the development of semi-dwarf varieties in China and at the International Rice Research Institute (IRRI) [7].
Yield stagnation in newly developed rice varieties has been observed in the tropics since the release of the first semi-dwarf variety, IR8, in 1966 [9]. This was reflected in the sharp drop in the rate of annual yield from 2.7% in the 1980s to 1.1% in the 1990s [10], and yields seem to have reached a plateau at about 8 to 9 t/ha [3]. In an effort to further increase yield and break the yield ceiling, breeding in rice today has become more challenging than ever [11].
The availability of information about molecular markers, genetic mapping and the full sequences of plant genomes that are currently evolving [12] enables breeders to identify areas in the genome that are associated with specific components of the phenotype and determine which parent contributes the desired alleles at a particular locus [13].
In Indonesia, one of the strategies that has been developed in the rice breeding program is the application of marker-assisted selection (MAS). MAS is widely accepted in the world due to its great potential to improve the efficiency and precision of conventional plant breeding [1]. Using MAS, brown plant hopper (BPH) resistant elite breeding lines were successfully developed via foreground and background analysis in a japonica background without linkage drag [14].
SSR markers are the most widely accepted markers for mapping QTLs in rice [1]. QTLs based on SSR markers are stable, found in abundant quantities and have high heritability [15]. Therefore, these markers are widely used for the genetic mapping of certain characters.
BIO-148 is a BC2F8 line derived from a cross between IR64 (a high-yielding lowland rice variety) and Gajah Mungkur (an upland tropical japonica rice variety) through a conventional breeding program at ICABIOGRAD [16, unpublished]. Bio-148 has demonstrated a potentially high yield variety, which can produce total spikelet number of about 250-300 spikelets per panicle under screen-house planting [Trijatmiko,personal comm.;16], as shown in a multi-location test [17]. The strain has longer flag leaf length, wider flag leaf width, longer panicles, higher plant height, a large number of tillers, and an early heading date. Those variations have the potential to be combined with IR64.
As the donor parent of Bio-148, the Gajah Mungkur variety also contributed better yield component traits (grain weights heavier than 1000 grains , longer panicle length, greater panicle branch density, and a larger grain number per panicle) [16] as compared to IR64. This variety known as a far-progeny of IRAT112 (a tropical japonica rice strain from Africa), which commonly used as a donor parent to provide the drought-resistance trait in rice breeding programs [18].
IR64 is a popular variety that is widely used in Indonesia and has been widely used as source of genetic background for many breeding purposes [19,20]. It was first released by the IRRI in 1985, and since then, it has been widely accepted as a high-quality rice variety in many countries [21]. Due to its elite characteristics, such as high yield potential, shorter growth duration, good food quality and enhanced resistance to several diseases and insect pests [22], IR64 predominated in almost all rice plantations in Indonesia until 2002 [19,20]. Several lines with IR64 genetic background, such as doubledhaploid (DH) lines, recombinant inbred lines (RIL), and thousands of mutant lines, have been developed for genetic analysis and improvement [23].
It is challenging to improve Bio-148 -which uses the IR64 genome as its genetic background -and make it suitable as a candidate for breeding superior varieties. Using Bio-148 as donor parent for higher spikelet number per panicle and combining it with IR64, we expect to generate a variety with characteristics similar to IR64 but higher yield potential.
The obstacles that still remain problem for the use of Bio-148 as a candidate for a breeding program is the existence of undesirable traits, i.e., a high percentage of panicle enclosure (incomplete panicle exertion) and the presence of awn in grains [Trijatmiko,personal comm.;16]. The high percentage of incomplete panicle exertion was reported to be the major cause of yield losses with Bio-148 due to the reduced number of grains available to be harvested.
There has not yet been any research on mapping QTLs associated with the awn grain and incomplete panicle exertion traits of Bio-148. We believe it is important to map QTL markers for the above traits in order to make easier to eliminate those unfavoured traits during selection. Moreover, there are many opportunities to explore new QTLs associated with yield components.
The objective of this study is to identify the position of the major QTL markers associated with the presence of awns in rice grains, incomplete panicle exertion and the total number of spikelets per panicle in an F2 population derived from crosses between the Bio-148 strain and IR64 variety.

Material and Methods
Plant material, population development and phenotypic evaluation. Bio-148 (P1) and IR64 (P2) were crossed to generate an F2 population for mapping in March of 2012 in the greenhouse of ICABIOGRAD. Single-seed descent was performed using F1 seed from the Bio-148/IR64 cross to develop an F2 population. In total, 200 seeds were planted from an F1 plant and allowed to undergo selfing to generate an F2 population. The 21day-old seedlings of parents and 200 F2 progenies were transplanted into pots with a diameter of 30 cm.
Phenotyping was conducted on the F2 population at the grain-filling stage. Panicle exertion, is the exertion of the panicle above the flag leaf sheath after anthesis [24]. The degree of panicle exertion was measured as the distance from the flag leaf ligule to the panicle node (cm). The positive and negative values indicated complete and incomplete panicle exertion, respectively [24]. The proportions of incomplete and complete panicle exertion were then calculated as the total number of incomplete exertions divided by the total number of observed panicles, which was used to represent the proportion of panicle exertion. This proportion will have a value ranging from 0 to 1. A '0' value means that complete panicle exertion predominates, and a '1' value indicates the dominancy of incomplete panicle exertion.
The awn is a long needle-like appendage that in some grass species, is formed on the lemma that encloses floral organs, together with the palea [25]. Awn length was measured by using ten main panicles of each plant in the F2 population. The apical spikelet of primary branches (cm) is used to represent the awn length of the entire panicle.
Preparation of DNA and DNA visualization. Genomic DNA isolation was performed using a modified CTAB protocol [26]. The quality and quantity of DNA were measured by using a Nanodrop (Thermo Scientific 2000), which can determined DNA quality based on the absorbance ratio between 260 nm to 280 nm. Good purity is reached if the value is within the 1.8-2.0 range [27].
A parental survey was performed using 549 rice SSR markers, which were randomly selected from the molecular laboratory primer's collection in ICABIOGRAD. PCR amplification was performed in a 10 µL volume containing 2 µL of 50 ng genomic DNA as a template, 5.68 µL ddH2O, 2 µL PCR buffer (Dream taq buffer), 0.2 µL dNTPs (dATP, dCTP, dGTP and dTTP), 1 µL SSR primer (mixed forward and reverse primers) and 0.12 µL Taq DNA polymerase.
These chemicals were dispensed in each well of a 96well PCR plate, along with 1 drop of mineral oil. PCR was performed at MJ Research PTC -100. PCR was performed via initial denaturation at 94 °C for 4 min and then 35 cycles of denaturation at 94 °C for 1 min, annealing at 55 °C for 30 s, extension at 72 °C for 2 min and final extension at 72 °C for 7 minutes.
DNA fragments, as a PCR product, were stained with SYBR-Safe (Invitrogen, Carlsbad, USA) and then separated on 8% acrylamide gels (C.B.S. Scientific, Del Mar, USA) via electrophoresis, using 1x TBE for manual allele scoring. The program for electrophoresis was as follows: 120 minutes, 70 volts and 500 mA. For further staining, we soaked the gel in 1% EtBr solution for 5 minutes and rinsed it with distilled water. Visualization was performed under trans-UV light using GelDoc.
Polymorphic markers were used to genotype the F2 population, and a linkage map was made based on 24 selected markers. The marker selection was based on the polymorphic banding pattern, which is a marker with a clear band and a distinct pattern between the two parents ( Figure 1). The distance between the two bands of polymorphic markers should be obvious and distinct in order to make the scoring or genotyping easier. The order of markers was confirmed using the complete genome sequence and the results of sequencing-and physical-distance-based Genome Annotation ver. 3.
The phenotypic and genotypic data were analyzed via single marker regression (SMR) using QGene Ver. 4.3.8 [28]. The following parameters are set for the mapping: the population structure is F2, the type of cross (mating string) is "s" and the genotype symbols are ABHxx -. Permutations of 10.000 iterations were used to determine the threshold of the QTLs in Qgene. Subsequently, LOD values at p<0.05 were used as the threshold to determine the significance of the QTLs. QTL names were designated by following the standard rice QTL nomenclature [29].
The normality test was examined using a Kolmogorov and Smirnov normality test. The correlation between characters was evaluated using a Pearson correlation test and t-test to determine any significant correlations among the tested traits.

Results and Discussion
Based on the phenotypic characterization, there is a large variation in awn length and the type of panicle exertion in the F2 population ( Figure 2 and Figure 3). In contrast to total spikelet number per panicle, for which the phenotypic variation is within the range of the parents, these two both traits showed a distribution of phenotypic variation beyond the range of the parents ( Table 1).
The average value of awn length observed in the F2 population was 0.27 cm larger than that of IR64. This suggested that the transgressive segregation of this trait occurred in one direction and was skewed towards the recurrent parent, IR64. Regarding panicle exertion, although the transgressive segregation was not as high as in the awn, the trait has showed a frequency distribution that is skewed towards the female parent, Bio-148.
Based on the curve in Figure 4, for awn length, among 200 individuals in the F2, 67 individuals showed shorter awn-lengths as compared to P1 (Bio-148), as indicated by the presence of one peak in the histogram near the P1 value ( Figure 4). However, interestingly, the curve is skewed to the right, towards the P2 distribution (IR64).
According to Figure 5, there are multi-modal distributions of incomplete panicle exertion traits, as indicated by the presence of three peak in the histogram.

Figure 2. Phenotypic Variation of Panicle Exertion
Two peaks fall beyond both parents, and one peak lies between the parents. This means that the frequency distribution for awn length and incomplete panicle exertion in the F2 population does not fit the normal distribution. This is also supported by the p-value for both traits, which is 0.0002 (Table 1). The means are significantly different from normal based on a Kolmogorov-Smirnov test.
According to Falconer and MackKay (1989) [30], in a segregation population, a non-normal distribution of particular traits is probably caused by the presence of major gene effects. If a gene has an effect that is large  enough relative to the background genetic and environment variation, it will produce a multinomial distribution within a segregated population [30]. A gene whose effect is not large enough to cause a multimodal distribution may nevertheless cause a detectable departure from normality [30].
According to Figure 6, the frequency distribution for spikelet number per panicle in the F2 population was close to normal. The normality value for this trait was 0.059, and the p-value was 0.5025 (Table 1). In terms of spikelet number per panicle, this study showed that transgressive segregation did not occur within this trait, although the distribution was skewed towards Bio-148 (the parent with the higher spikelet number). Transgressive segregation is defined as the appearance of individuals in segregating populations that fall outside of their parental phenotypes [31].
Overall, the distribution of the observed phenotypic traits shows continuous segregation and skewness ranging between -0.77 and 2.3, suggesting that these characters are inherited quantitatively, which means that these traits are under polygenic control and that both parents, Bio-148 and IR64, contributed genes for these characters.

Correlation analysis.
A total of six pairwise combinations were formed among four traits, of which only one combination was found to be significant at 1% probability and two combinations were found to be significant at 5% (Table 2). A correlation analysis revealed that panicle exertion had a significantly negative contribution to yield, as shown by r= -0.094*. This means that using a probability level of 5%, only 9.4% of yield can be attributed to incomplete panicle exertion's contribution. Awn length had a positive but weak correlation and did not significantly contribute to yield (0.055ns). Awn and incomplete panicle exertion (IPE) had a negative correlation with total spikelet number, and only awn showed a significant contribution (r= -0.130*), while IPE did not (Table 2).
In term of panicle exertion, according to our visual observations, incomplete panicle exertion causes panicles to be susceptible to bacterial and fungal diseases, i.e., bacterial leaf blight and blast, smut or sheath blight, as well as abnormal nutrient distribution to the inner grains of the panicle, which can result in  High humidity supports the development of plant pathogens. Grains or seeds within enclosed panicles that were exposed to higher humidity were vulnerable to pathogenic invasions and disease. Seeds are easily attacked by rot fungus, potentially infecting other seeds inside the panicle (Figure 7). Given accumulative conditions, this significantly contributes to the decrease in grain weight and the number of seeds produced.
Marker segregation and Chi-square. The number of SSR markers used in the survey of polymorphism between IR64 and Bio-148 was 549. The markers were spread randomly over twelve rice chromosome, with numbers ranging from 30 to 83 markers per chromosome. Chromosome 1 was surveyed using the largest number of SSR markers (83 markers), whereas Chromosomes 11 and 12 were surveyed with the fewest markers (30 markers). The differences in the number of molecular markers per chromosome in this study are merely due to the availability of primer stock at the laboratory collection for use in this study.
Of the 549 SSR markers used for the parental survey, 64 markers (11.66%) showed polymorphism between the two parents (  Table 4).
The result indicate that the segregation distortion may occur at both of the two loci. The alleles at the RM448 and RM474 loci do not segregate independently or do not have the same opportunities to segregate during gamete formation. Significant segregation distortion was also observed in markers RM402 and RM441 (data not shown), in which the alleles of those markers segregated with skewness, only following one parent. The two markers are excluded from the subsequent analysis and replaced by other markers.
Although several markers exhibit segregation distortion, this does not mean that the markers do not play important roles in detecting recombinants. According to Shizong (2008) [32], the use of markers with segregation distortion may increase the clarity of the obtained mapping, as well as being helpful in statistical analysis for QTL mapping. In this study, the markers RM474 and RM448 remain included in the analysis of linkage markers.
Segregation distortion can be caused by several factors. Selection pressure, which is usually applied in order to maintain preferred traits and discard undesirable traits, can inadvertently decrease the allele frequency of unselected traits. Because the allele frequency decreases, the probability of obtaining all combinations via segregation becomes also smaller, thus allowing high levels of segregation distortion to occur. Fawcett et al. (2013) [33] have explained how an important rice chromosome segment may be swept away due to selection pressure. According to Fawcett et al. (2013) [33], cultivated species often show a reduction in genome-wide genetic variation due to a bottleneck in the initial phase of domestication. This is followed by a reduction of polymorphisms that is significantly greater than what would be expected. This is because other genetic variants in neighboring regions are swept out due to the hitchhiking effect [33].
Segregation distortion may also be due to the partial lethal factor or the presence of gametophytic or sterility genes, as well as the large proportion of non-functional pollen in indica/japonica crosses [34]. Segregation distortion is commonly observed in indica-japonica crosses, in which the direction of skewness varies in different populations and chromosomal regions [34]. In this study, Bio-148 is a far-progeny of tropical japonica rice, an IRAT-local African rice, through allele contribution from its donor parent, the Gajah Mungkur variety [Trijatmiko,personal comm.;16]. As the progeny of IRAT112, the Gajah Mungkur variety carries a piece of the chromosomal segment of IRAT112 rice, which may be inherited by Bio-148. Although has been experienced due to back-crossing to IR64 twice, it is suggested that this strain (Bio-148 or BC2F8) still carries the parental genome from IRAT112.

QTL identification.
Single marker regression (SMR) was used to identify QTLs associated with awn, incomplete panicle exertion and total spikelet number. SMR was implemented in QGene ver.4.3.8 software [28]. A permutation test with 10,000 iterations at α 0.05 was used as a threshold to declare a significant association/linkage between a marker and a QTL. According to Table 5, for each trait, the experiment-wise threshold corresponded to an LOD score at α 0.05 that ranged between 2.673 and 2.74. A total of four QTLs for awn (1), incomplete panicle exertion (1) and total spikelet number (2) were detected on three rice chromosomes using SMR. SMR identifies QTL on the basis of the difference between the mean phenotypes of various marker groups, but it cannot separate the estimates for the recombination fraction and QTL effect. [11].
QTL for awn length. QTL analysis using SMR identified only one QTL on Chromosome 8, designated as awn.8.1, which was mapped on RM256 with a peak LOD value of 8.262 (Table 5).
QTL for incomplete panicle exertion. QTL analysis using SMR identified one QTL associated with incomplete panicle exertion, which was designated as ipe.4.1, on chromosome 4. This QTL contributed 9.1% of phenotypic variation, and the IR64 allele at this locus increased the proportion of incomplete panicle exertion by 0.025 (Table 5).   [35] reported that a QTL for the awn gene was first reported in 1963. Then, three genes, An-1, An-2 and An-3, were also identified and mapped to Chr 3, Chr 4 and Chr 5. Since then, a total of 31 loci have been found to be associated with awn presence in rice [36]. In this study, a QTL is identified in Chromosome 8 and designated as awn.8.1.
This is suggested to be a new QTL for awn in rice. The various locations of this quantitative locus may be due to the different types of genetic population testing applied and the number of markers used in this study.
The correlation analysis in this study revealed that the awn trait has a significantly negative correlation with total spikelets per panicle, suggesting that the presence of the QTL for awn within a certain chromosomal region will decrease the spikelet number per panicle. Luo et al. (2013) [37] found that An-1, a major quantitative trait locus, regulates awn development and grain elongation and affects grain number per panicle. The lower average for total spikelets per panicle in the F2 population as compared to P2 in this study may be due to the fixation of the awn gene from allele IR64 as the recurrent parent, which contributes traits to the F2 individual. The higher the frequency of backcrossing towards IR64, the greater the number of IR64 alleles in the progeny. Unfortunately, this trait has a negative correlation with yield. According to Luo et al. (2013) [37], An-1 has a multifunctional gene with pleiotropic roles in rice development and is related to promoting cell division in rice. The upregulation of An-1 expression during the early stage of inflorescence formation may lead to the down-regulation of LOG expression [37]. This may reduce meristem activity and then reduce grain number per panicle and yield per plant [37]. Nevertheless, as a negative regulator, awn has an important function as an aid in seed dispersal, seed burial and protecting cereal grains from animal predation [25]. Although awns are less preferred for a long period of domestication, awns contribute significantly to photosynthesis and yield [38,37].
Another result of this study is the identification of a quantitative locus associated with incomplete panicle exertion, with a major QTL being designated as ipe.4.1, on Chromosome 4. Panicle exertion seems to be related to many other traits in rice [38,24]. According to Zeng et al. (2009) [39], panicle exertion is a cold-tolerancerelated trait at the booting stage and has a significant association with spikelet fertility and all morphological traits, including plant height, node length under the spike, leaf length, leaf width, spike length, full grains per spike and total grains per spike. It has been reported that incomplete panicle exertion has been cited as a symptom of cold injury in many countries [24]. Furthermore, panicle exertion is also an indicator of genotype's adaptability in cool temperatures [24]. Genes controlling the trait are inherited moderately, meaning that the heritability value (in the narrow-sense or broad-sense) is less than 50% and that both the parents may complementary distribute the genes [24]. Han et al. (2006) [40] reported that there were 44 QTLs for panicle exertion located on almost all chromosomes, except Chromosome 12. In another study, using a doubled haploid population, Hittalmani et al. (2003Hittalmani et al. ( , 2002 [41,38] mapped eight QTLs for panicle exertion on rice Chromosomes 4 and 11. In contrast to both of these, our research located a QTL for incomplete panicle exertion rather than panicle exertion. We identified that IR64 was the parent with the ipe.4.1 allele on Chromosome 4, which contributed to increasing incomplete panicle exertion within the individual F2 segregating population.
The correlation analysis in this study also revealed that incomplete panicle exertion has a negative correlation with yield. The high percentage of incomplete panicle exertion, according to this study, can be a major cause of yield loss in Bio-148 due to the reduction of the number of available grains to be harvested. Considering that IR64 is the contributing parent of this undesired trait but also the genetic background for Bio-148 improvement, careful in applying selection for this trait should be taken.
Spikelet number per panicle, as a key component of grain yield, has become the major concern in rice research breeding programs for yield potential improvement [42,43]. Spikelet-number-per-panicle-related genes and genes related to spikelet regulation have been isolated, such as the MOC-1 gene -a putative GRASS family member, which leads to fewer branches per panicle [44], FZP-a positive regulator of floral meristem identity -can suppress the formation of axillary meristems of rice spikelets [45], and LAX-1 -a regulator in rachis-branch meristem initiation and/or maintenance during panicle development -has also been identified [45]. Other QTL genes responsible for increasing the number of rachis branches in a panicle were also identified, such as DEP1,OsCKX2, SB, PB, DEP2, WFP and APO1 [46]. Recently, a cloning and characterized qGP5-1 has been made. A newly identified gene, OsEBS (enhancing biomass and spikelet number), has been found to control rice biomass and spikelet number, i.e., increased plant height, leaf size, and spikelet number per panicle, leading to an increase in total grain yield per plant [47].
We have identified QTLs for spikelet number, designated as tsn. 4 [48] reported a total of twelve QTLs on seven chromosomes affecting spikelet per panicle (SPP), with a major QTL, qSPP4-1, in the RM3276-RM5709 marker interval and LOD 6.58-13.84, which was identified on Chromosome 4 explaining 9-16% of phenotypic variation [48]. The different numbers of quantitative loci for spikelet number between Marathi's study and our study may be caused by the different types of genetic and population testing applied and the number of markers used in the study. We also confidently confirmed the presence of tsn.4.1 with peak marker RM17403 on Chromosome 4 in this study, which is an additional QTL for total spikelet number in rice. The QTLs identified in this study will be useful for the improvement of yield potential in modern lowland indica rice varieties by harnessing the hidden useful alleles from upland tropical japonica rice varieties.

Conclusions
The present study reveals that there is one QTL for awn length trait (awn.8.1), one QTL for incomplete panicle exertion (ipe.4.1) and two QTLs for spikelet number per panicle (tsn.4.1 and tsn. 4.2) in the testing population. The information generated in the present study will be useful in finely mapping and/or identifying the genes underlying major robust QTLs.