QTL mapping for yield components and agronomic traits in a Brazilian soybean population

The objective of this work was to map QTL for agronomic traits in a Brazilian soybean population. For this, 207 F2:3 progenies from the cross CS3035PTA276-1-5-2 x UFVS2012 were genotyped and cultivated in Viçosa-MG, using randomized block design with three replications. QTL detection was carried out by linear regression and composite interval mapping. Thirty molecular markers linked to QTL were detected by linear regression for the total of nine agronomic traits. QTL for SWP (seed weight per plant), W100S (weight of 100 seeds), NPP (number of pods per plant), and NSP (number of seeds per plant) were detected by composite interval mapping. Four QTL with additive effect are promising for marker-assisted selection (MAS). Particularly, the markers Satt155 and Satt300 could be useful in simultaneous selection for greater SWP, NPP, and NSP.


INTRODUCTION
Soybean is by far the main export product in Brazil, and it is grown almost everywhere in the country.In the 2014/2015 season, its production accounted for 96.243 million tons, which corresponds to 46% of total grain yield in the country (CONAB 2015).Due to its economic importance in the Brazilian agricultural scenario, soybean breeding programs seek to develop more productive cultivars for the various Brazilian conditions.
Traditionally, productive parental lines are used to obtain new cultivars (Carter et al. 2004), which contribute to the narrowing of the genetic base of the improved germplasm, and raises difficulties to obtain further gains in yield.In addition to the low variability in the improved germplasm (Hyten et al. 2006), other factors also hinder the selection of productive cultivars, such as the environmental influence (Ainsworth et al. 2012), which reduces the efficiency of selection of superior genotypes, and the existing negative correlation between grain yield and protein content (Popovic et al. 2012), since another purpose of breeding programs is the increase in the protein content of the grain.Facing these difficulties, the knowledge of the genetic control involved with grain yield may direct more effective strategies for selection, such as marker-assisted selection (MAS).
In soybeans, several QTL have recently been mapped for traits related to grain yield, although many of them may be repetitive.Altogether, at least 99 JIS Rodrigues et al.
According to the studies mentioned above, the genetic control of grain yield components involves several loci, and has strong environmental influence.For this reason, molecular markers have been used in the identification and location of QTL for strategies involving MAS, which can be useful in breeding for allowing indirect selection of agronomic traits from early generations and early stages of plant development.By using MAS, unfavorable alleles can be eliminated or greatly reduced at early generations, which allows the evaluation and selection of a small number of plants in the field.In another application, MAS can facilitate the introgression of favorable alleles in commercial materials from nonadapted germplasm sources.
In spite of the number of QTL available in the literature for grain yield components in soybean, none of the QTL has been mapped for Brazilian cultivars and/or in Brazilian soil and climate conditions.Thus, this study aims to map QTL for yield components and agronomic traits from an F 2:3 soybean populations, from the cross between a line and a Brazilian cultivar, aiming at implementing MAS in breeding programs in the country.

MATERIAL AND METHODS
From the cross between the line CS3035PTA276-1-5-2 and the variety UFVS2012, it was obtained a population of 207 F 2 plants.The F 2 progenies were cultivated in a greenhouse, and young leaves were collected, frozen in liquid nitrogen, and stored at -80 °C for subsequent DNA extraction.Phenotypic evaluation was carried out in F 3 families cultivated in the field, using a randomized block design with three replications.In each block, it was collected phenotypic values of five individuals per family.Cultivars BARC-8 and Monarca, and the parents CS3035PTA276-1-5-2 and UFVS2012 were also cultivated as controls.This trial was carried out in the Experimental Field Diogo Alves de Mello (lat 20° 45′ S, long 42° 52′ W, alt 650m asl), Viçosa, in the state of Minas Gerais, Brazil.The soil of the region is classified as clayey dystrophic red-yellow latosol.
In the F 3 generation, the following agronomic traits were evaluated: number of days to flowering (NDF), from the emergence until 50% flowered plants in the row; number of days to maturity (NDM), from the emergence until 95% pods with maturation color; plant height at maturity (PHM), expressed in centimeters, from the ground level up to the last node on the main stem; height of the first pod (HFP), expressed in centimeters; number of nodes at maturity (NNM), counted on the main stem from the cotyledon node, at the R8 stage; number of pods per plant (NPP); seed weight per plant (SWP); weight of one hundred seeds (W100S); and number of seeds per plant (NSP).It was carried out analysis of variance of each variable using the GENES software (Cruz 2013).The statistical model used (family trial with intercalated controls) is illustrated below.
in which: Y ij is the value of the trait for the i-th treatment in the j-th block; m, is the general mean of the treatments; g i , the effect of the i-th treatment (i = 1,2, ..., t); b j , is the random block effect (j = 1,2, .., r), and ε ij is the random error of the control, being ε ij ~NID (0, s 2 ).
Broad-sense heritability (h 2 ) and the coefficient of experimental variation (CVe) were obtained by the following estimators: CV e = 100(MSF) 0.5 /μ in which: MSF and MSR refer to the mean square of the families and mean square of the residue, respectively, and μ is the general mean.The components of residual variance (σ 2 ), of genotype (σ 2 g ) and of controls (Ф te ) were estimated as follows: in which: MSR is the mean square of the residue; MSF is the mean square of the families; MSC is the mean square of the controls; and r is the number of replications.The hypothesis of normal distribution of the nine traits evaluated was tested by the Lilliefors test.
DNA extraction was carried out by the CTAB method, according to Doyle and Doyle (1990).Microsatellite markers used in the experiment were developed by Cregan et al. (1999), and the respective sequences of the pair of primers are available in the database Soybase (http://soybase.org/).PCR reactions were carried out in a total volume of 15 μL containing: Tris-HCl 10 mM, pH 8.3; KCl 50 mM; MgCl 2 2 mM; Triton X100 0.1%; 100 μM of each deoxynucleotide; 0.3 μM of each microsatellite primer; a unit of Taq DNA polymerase (Phoneutria); and 30 ng DNA.PCR reactions had initial step at 94 °C for 4 min; 30 cycles of 94 °C for 1 min; 55 °C for 1 min, and 72 °C for 2 min; and a final step at 72 °C for 7 min.Amplification products were separated by electrophoresis in 10% polyacrylamide vertical gels, using 1X TAE buffer (40 mM Tris-acetate and 1 mM EDTA), and running time of three hours at 140 volts.Gels were stained with ethidium bromide (1µg mL -1 ) and recorded with the aid of a photodocumentator device Eagle Eye II (Stratagene).
The individual segregation of molecular markers was tested by the chi-square test using the Bonferroni criteria (p <0.05).Molecular markers with an expected segregation (1:2:1) were grouped using a minimum LOD score of 3.0, and maximum recombination frequency of 30%.The Kosambi mapping function was used as a distance measure, and the Rapid Chain Delineation algorithm (RCD) (Doerge 1996) was used to define the order of the markers in the linkage groups.For the construction of the genetic map, it was used the GQMOL software (http://www.ufv.br/dbg/gqmol/gqmol.htm).
The association between the trait and molecular marker was determined by the single marker analysis, using linear regression (i.e., by the regression of the values of each quantitative trait in function of the scores related to the genotypes of the markers).The presence of the QTL in the intervals of the linkage map was evaluated by composite interval mapping for the nine traits (Jansen 1993, Zeng 1994).Selection of cofactors was based on the stepwise regression analysis, at 5% and 10% significance level for the input and output of independent variables, respectively.For some of the intervals, it was used cofactors of the linkage group (LG) and a significant association with the variable by the single marker analysis.The estimates of the additive and dominance value, the coefficient of determination corresponding to the peak of higher statistical significance of the QTL, and the position of each QTL were declared when the maximum likelihood ratio values (LR) exceeded the cut-off critical values (α = 0.05) in each LG.The critical LR values were determined by performing 1000 permutations.QTL analyses were carried out with the aid of the GQMOL (http://www.ufv.br/dbg/gqmol/gqmol/htm).

RESULTS AND DISCUSSION
The analysis of variance of the agronomic traits indicated genetic variability for the nine traits at 1% probability, evidencing the potential of the population for QTL analysis (Table 1).The suitability of the population is also evidenced by the contrast observed between the means of the parents for most traits (Table 2).Although there was no difference between the means of the parents for the variables NDF, NDM and W100S, these traits presented transgressive segregation between the F 3 families.The ratio CVg/CVe was high for all traits, especially for NDM, NNM and W100S, indicating that most part of the variation was due to genetic variance.The coefficients of variation presented acceptable values, which were close to those observed by other authors in Brazilian conditions (Pires et al. 2012, Rocha et al. 2012, Barbosa et al. 2013, Torres et al. 2014), indicating good precision in controlling the causes of experimental variation.Broad-sense heritability values between traits ranged from 44.92 to 86.59%, including medium and high values.Estimates close to those found for the same yield traits are reported in other studies on soybeans (Pooprompan et al. 2006, Vieira et al. 2006, Kim et al. 2012).Higher heritability coefficient was observed for the traits NDM, NNM and W100S, indicating that the phenotypic selection efficiency can be high, since the heritability expresses the proportion of total variance, which is attributed to the genetic variation (Falconer and Mackay 1996).High heritability values for the same traits are also reported by Eskandari et al. (2013) and Palomeque et al. (2009).
Forty-eight pairs of microsatellite primers were evaluated in the segregating F 2 population.Segregation of each JIS Rodrigues et al.
marker was tested using the chi-square test, and three molecular markers presented distorted segregation (Satt352, Satt429, and Satt454).However, these molecular markers did not occur in any of the intervals with significant association with the studied traits.In the linkage analysis, it was obtained nine LG by the grouping of 25 microsatellite markers, representing part of the LG A1, B1, D1a, G, I, M, and O (Figure 1).The position of the molecular markers in each LG coincided with the consensus map of the species, with the exception of the marker Satt370, which was located in LG I, while in the reference map (Song et al. 2004), the same marker is located was the LG D1a.As expected, 23 molecular markers were not grouped in any of the LG, since the distances between the markers were greater than 30 cM, according to the consensus map published by Song et al. (2004).The grouping of the molecular markers used LOD=3 and r=0.30 as criteria, but for eight of the nine LG, grouping also occurred with LOD=5, which shows the consistency of the groups.
The association between trait and molecular markers was evaluated by means of linear regression, in which the significant effect indicates the existence of an association between a marker and a trait, due to genetic linkage between a marker and a QTL (Schuster and Cruz 2004).The associations detected in this method can be attributed to QTL of major effect and relatively distant from the marker, or to a QTL of minor effect and close to the marker.In total, 59 significant associations between markers and traits were observed in the single marker analysis using linear regression (Table 3).Thirty molecular markers presented association with at least one variable, and the number of significant markers for each variable ranged from 2 to 13. Fifteen molecular markers also presented association with more than one trait.QTL for the respective trait in the same linkage groups have been reported (Reinprecht et al. 2006, Chen et al. 2007, Gai et al. 2007, Zhang et al. 2010, Liu et al. 2011, Kim et al. 2012).Other QTL related to these traits can also be found in the Glicyne max (http://soybase.org/)genomic database.
QTL analysis in the intervals of the linkage map was carried out by composite interval mapping.In this method, additional markers are included as cofactors in the analysis, increasing the accuracy of the estimates of position and effect of the QTL.The inclusion of these markers reduces the effects caused by other QTL out of interval, which increases the power of the test, according to Schuster and Cruz (2004).
In the QTL analysis by composite interval mapping, it was detected five QTL for four of the nine studied traits (Table 4).Two QTL were identified for SWP, in the linkage groups A1 and D1a, which explained 12.32% and 9.03% of the total phenotypic variation, respectively (Table 4).Another QTL associated with W100S was mapped in LG I.This QTL located in the region of the locus Satt239 explained 13.47% of the variation.Two other QTL associated with NPP and NSP were mapped between the markers Satt300 and Satt155 in LG A1, in the position of the QTL mapped for SWP.The variation in NPP and NSP explained by the QTL was of 9.43 and 7.19%, respectively.Probably, these QTL may be very close in the region of the marker Satt300, or alternatively, they can be a single QTL associated simultaneously to three variables, according to the hypothesis of pleiotropic effect.
QTL related to yield in LG A1 and I have been reported by other authors.In the same region of the molecular marker Satt300, in LG A1, Chen et al. (2007) mapped one QTL, which explained 12.56% of the variation in seeds weight per plant, with estimated additive effect of 7.1 g.Another QTL associated with yield in the region of the same marker was mapped by Guzman et al. (2007), which explained 18% of the variation in yield, with estimated additive effect of 60 kg ha -1 .Another QTL for yield in the same region was reported by Palomeque et al. (2009).(2) homozygous for parent P1 CS3035PTA276-1-5-2) (AA), (0) homozygous for the parent P2 (UFVS2012) (aa) and (1) respective heterozygous (Aa).NDF = number of days to flowering; NDM = days to maturity; PHM = plant height at maturity; HFP = height of the first pod; NNM = number of nodes at maturity; NPP = number of pods per plant; SWP = seed weight per plant (g); W100S = weight of one hundred seeds (g); and NSP = number of seeds per plant.
In LG I, Sebolt et al. (2000) mapped one QTL related to grain yield in two environments in the region of the marker Satt127, at approximately 2 cM from the interval of the QTL mapped in LG I in the present study, according to the reference map (Song et al. 2004).Moreover, from the combined data of five environments, Csanádi et al. (2001) mapped one QTL for the weight of one thousand seeds in the region of the marker Satt562, at approximately 5 cM from the same interval.Reinprechet et al. (2006) also observed an association of the molecular marker Satt367 with the weight of one hundred seeds in three environments.In addition, another marker in the LG I located at approximately 18 cM from Satt367, Satt354, was identified in two of the environments, according to the author.Another QTL in the LG I related to grain yield (g plant -1 ) is reported by Du et al. (2009) between the molecular markers Satt102 and Sat419, at approximately 5 cM from the QTL described for W100S in the present study.Palomeque et al. (2009) and Palomeque et al. (2010) also found another marker in the LG I associated with grain yield (kg ha -1 ) in multiple environments (Satt162).However, this marker is located in another region of LG I.In addition, another molecular marker related to yield was found in LG D1a, (Sat036), as reported by Orf et al. (1999), explaining 6% of the variation in seed weight (mg seed -1 ).This marker is found at approximately 30 cM from the interval of the QTL mapped in this study.
In the present study, it was identified several molecular markers associated with nine agronomic traits, evidencing a large number of QTL involved in the control of the traits, and complexity in the genetic base related to grain yield in soybeans.Although several QLT related to grain yield have been described in the scientific literature, maybe only a few of them have already been confirmed (Kassem et al. 2006, Bernardo 2008, Sebastian et al. 2010).QTL, which have already been selected in breeding programs and/or transferred to new cultivars are rare (Ainsworth et al. 2012).Thus, further studies are still necessary, so that additive QTL was manipulated by MAS.LG = linking group according to Song et al. (2004); R 2 = proportion of phenotypic variation explained by the QTL; a = estimated additive effect for the QTL; d = estimated dominance effect for the QTL; *,** LR values that exceed the cut-off critical values at 5% and 1% probability, respectively.Critical LR values were determined by 1000 permutations.SWP = seed weight per plant (g); W100S = weight of one hundred seeds (g); NPP, number of pods per plant; and NSP = number of seeds per plant.
SD = standard deviation; Min/Max = minimum and maximum value of the trait; NDF = number of days to flowering; NDM = number of days to maturity; PHM = plant height at maturity; HFP = height of the first pod; NNM = number of nodes at maturity; NPP = number of pods per plant; SWP = seed weight per plant (g); W100S = weight of one hundred seeds (g); and NSP = number of seeds per plant.

Figure 1 .
Figure 1.Linking Groups from the analysis of the F2 population with 48 microsatellite markers based on the criteria of LODmin = 3.0 and rmax = 0.30.

Table 1 .
Analysis of variance, means, coefficients of variation, and genetic parameter for agronomic traits in F 3 generation of the cross between CS3032PTA276-1-5-2 and UFVS2012 *, ** Significant at 5 and 1% probability by the F test; NDF = number of days to flowering; NDM = number of days to maturity; PHM = plant height at maturity; HFP = height of the first pod; NNM = number of nodes at maturity; NPP = number of pods per plant; SWP = seed weight per plant (g); W100S = weight of one hundred seeds (g); and NSP = number of seeds per plant.

Table 3 .
QTL analysis by single marker for agronomic traits from an F 2:3 population derivative from a cross between CS3035PTA276-1-5-2 and UFVS2012 *,** Significant at 5 and 1% probability by the F test, respectively.

Table 4 .
QTL analysis by composite interval mapping for agronomic traits from an F 2:3 soybean population