Identification of superior genotypes and soybean traits by multivariate analysis and selection index 1

- The selection of superior genotypes of soybean is a complex process, thus exploratory multivariate techniques can be applied to select genotypes analyzing the agronomic traits altogether, increasing the chance of success of a breeding program. Thus, the objective of this study was to select soybean genotypes carrying the RR gene with good agronomical performance through of multivariate analysis and selection index and identify those traits that influence, also verifying the agreement of multivariate techniques and selection index in the selection process. The experiment was conducted in an increased block experimental being evaluated 227 genotypes of F 5 generation, which 85 of those were detected to be glyphosate-resistant by PCR. The following traits were evaluated: number of days to maturity, plant height at maturity, lodging, agronomic value, number of branches per plant, number of pods per plant, hundred seeds weight and grain yield. The principal components analysis resulted in the selection of sixteen genotypes with higher grain yield. The traits related to the production of components exerted great influence on grain yield. The clustering by K-means and Ward’s methods were similar because they clustered the specific genotypes for the selected traits in the principal components analysis in the same group. There was an agreement on the results of the multivariate analysis in the selection index of Mulamba and Mock in relation to the selected genotypes. The methodologies applied are efficient for selecting genotypes.


INTRODUCTION
The soybean culture is a very important part of Brazilian agriculture.In 2016/17, Brazil produced 113.92 million tons of soybean (CONAB, 2017) and the areas seeded with genetically modified cultivars reached 96.51% of total culture area, corresponding to 32.7 million hectares (CÉLERES, 2016).The emergence of soybean cultivars resistant to the Roundup herbicide significantly changed weed management in these crops by providing benefits, such as efficient control and management flexibility in the post-emergence stage (MATSUO et al., 2009;MENEGATTI;BARROS, 2007).
Selecting superior soybean genotypes is a complex process because the economically important agronomic traits are quantitative by nature and are correlated to each other (ALMEIDA; PELUZIO; AFFERRI, 2010;NOGUEIRA et al., 2012).Promising genotypes should simultaneously unite a set of positive traits, which could elevate yield to fulfill the market demand (CRUZ, 2013).Taking into consideration the complexity of the most important plant traits, we believe more efficient selection criteria is required.
Multivariate exploratory techniques can be used for selecting superior genotypes, simultaneously analyzing all the agronomic traits and the relationships they have with each other, as well as discriminating which is the most influencing in a selection process.Among the types of multivariate techniques, the analyses that stand out are the principal components analysis, and the hierarchical and non-hierarchical cluster analysis methods (HAIR et al., 2009).
The main components analysis aims to reduce dataset variability, keeping as much relevant information as possible in the lowest number of components (BARBOSA et al., 2013;DALLASTRA et al., 2014;S I LVA et al., 2010).The cluster analysis, on the other hand, aims to classify similar individuals into the same group, while forming a heterogeneous set of groups (VIANNA et al., 2013).Genotype selection studies have been performed with these techniques on several different cultures, according to Bertini et al. (2010); Dallastra et al. (2014) and Vianna et al. (2013).In addition, multivariate methods have also been used in soybean genetic divergence studies (FERREIRA et al., 2015;PELUZIO et al., 2012;VILLELA et al., 2014).
Selection indexes are methodologies that can identify superior genotypes.They enable researchers to efficiently select a set of economically expressive agronomic traits, increasing the success of the genetic improvement program (CRUZ, 2013;ROSADO et al., 2012).
The use of indexes to select soybean genotypes can promote better total gains, distributed among all the available traits, which is adequate for genetic improvement programs (COSTA et al., 2004).Selection indexes were used in recent soybean-traits studies (BIZARI et al., 2017;LEITE et al., 2016).Notably, the Mulamba and Mock (1978) sum of ranks index consists of classifying the genotypes according to each trait in favorable order of improvement and has the advantage of not needing to establish economic importance and variance and covariance estimations (REZENDE et al., 2014).
Based on the aforementioned considerations, the present study aims to select Roundup Ready soybean genotypes with positive agronomic traits, using multivariate assessments and selection indexes.Furthermore, it intends to identify which traits influence selection most and evaluate the relationship between these features, as well as verify the correlations between multivariate techniques and selection indexes in a selective process.

Agronomic analyses
The experiment was conducted in the 2013/2014 agricultural year at the Jaboticabal Campus of UNESP/ FCAV (São Paulo State University), located in the northern region of São Paulo State (21°15'22" S latitude and 48°18'58" W longitude).The genotypes evaluated in this study were obtained by artificial hybridization between conventional genitors (seed-bearing parent), lineages of the FCAV/UNESP Jaboticabal genetic improvement program, and genitors with the RR gene, commercial cultivars (male parent).In total, twenty genetic crossing were assessed.
Treatments consisted of 227 F 5 generation soybean lineages, regardless of having the RR gene or not, since up to the present generation they had not been selected for glyphosate resistance.The augmented block design proposed by Federer (1956) was used, in which the plot was composed by a 5-meter long row with 0.5-meter spacing.Two check cultivars were used as interspersed controls (CD-216 and Vmax).
The following agronomic traits were assessed: number of days to maturity (NDM); plant height at maturity (PHM) in cm; lodging (L) visual scoring system, varying from 1 (all plot plants erect) to 5 (all plot plants lodged); agronomic value (AV) visual scoring system, varying from 1 (unwanted plants) to 5 (good plants); number of branches per plant (NB); number of pods per plant (NP); weight of 100 seeds (WHS) in g; and grain yield (GY) converted into kg ha -1 .
Identification of superior genotypes and traits of soybean through of multivariate analysis and selection index

Molecular analyses
The Roundup Ready (RR) soybean genotypes were detected by the PCR laboratory technique, which distinguishes genotypes through the presence of molecular imprints, resulting from specific transgenic DNA sequences, which are amplified.Genotypes containing the RR gene were identified through specific oligonucleotides that pair with the Forward sequence 5'TGATGTGATATCTCCACTGACG 3' and Reverse 5'TGTATCCCTTGAGCCATGTTGT 3', which encodes the EPSPS region.The RR sequence has a fragment size of 172pb (MARCELINO; GUIMARÃES; BARROS, 2007).
The genomic DNA samples were extracted from the tissue of young trefoil, using the CTAB method described by Ferreira and Grattapaglia (1995).The PCR reactions and electrophoresis separation of the amplified fragments in agarose gels were conducted according to Silva et al. (2015).
Table 1 exhibits a list of the crossings and genitors used to form the segregating populations, as well as the number of soybean genitors that have the RR gene, characterized and identified in generation F 5 for each crossing.Among all the assessed genotypes, both crossing number 14 and 15 presented no RR gene in their genotypes.

Statistical analyses
The soybean genotypes containing the RR gene were encoded from 1 to 85.Further selection was conducted based on the assessed agronomic traits.
Data were standardized and resulted in an average of zero and a variance of one for all variables as recommended in multivariate analysis (STATSOFT, 2004).Given the dependence structure and complexity of the original set of variables, multivariate analyses of principal components, Ward's hierarchical clustering, and K-means non-hierarchical clustering were performed for RR genotypes.These analyses were to improve the discrimination of superior genotypes and compare the results from exploratory analyses, as well as to test the correlation of the results from all the applied methods.All multivariate statistical analyses were performed using the STATISTICA v.10 software (STATSOFT, 2004).W. S. Leite et al.
Genotypes were selected by the principal components analyses, establishing criteria in which the scales of the graphs were pre-determined through ellipse on the graph axes to enable precise discrimination of progenies based on their different traits, according to Dallastra et al. (2014) andVianna et al. (2013).The lowest scale is the least stringent and consists of an X-axis varying from 2 to -2 and Y axis from 2 to -2.The second scale, more complex, varies on the X axis from 3 to -3 and on the Y axis from 3 to -3.This pattern goes on for all biplot graphs.Therefore, Plant breeders have to decide on the best level of selection intensity to use in the analysis and, hence, in the number of individuals selected for the interesting traits.The Ward's clustering generated a dendrogram to obtain prior information about the number of groups that would be formed by the set of genotypes, based on the established between-group linkage distance.The dissimilarity measurement between the genotypes was based on the Euclidian distance (HAIR et al., 2009).
In the K-means cluster analysis, we predetermined that 5 groups of RR soybean genotypes would be formed.The observation on the graphs indicate the "centers" of the groups for the analyzed traits.To determine the number of groups, the prior information created on the dendrogram was considered, based on the number of groups formed according to the adopted group connection distance.Therefore, with the K-means, we aimed to minimize the distance between each point and its centroid (HAIR et al., 2009).
For the RR soybean genotypes, the sum of ranks selection index, described by Mulamba and Mock (1978) with 19% intensity, was used, aiming to select the same number of genotypes discriminated by the

RESULTS AND DISCUSSION
In the principal components analysis, three eigenvalues ware higher than one, explaining 67.58% of the variance contained in the eight original variables.Eigenvalues above one (1.0)generate components with significant quantities of information of the original variables (BERTINI et al., 2010;DALLASTRA et al., 2014).However, for those below one, the quantity of information retained in the component has no relevance.
The eigenvalue for the first principal component (PC1) was 2.47, which corresponds to 30.92% of total variance.The main variables that explained the variance retention were PHM, L, and AV, with correlation values of PC1 above 0.5 (Table 2), since the values above this magnitude, regardless of the signal, contribute the most to the generated component.The eigenvalue for the second main component (PC2) was 1.81, which retained 22.63% of the total variance, explained especially by the traits NB, NP, and GY.The third main component (PC3) retained 14.03% of the variance, explained mainly by the NB variable.For each principal component, same signals mean that the traits are positively correlated, and opposite signals mean the traits are negatively correlated (HAIR et al., 2009).
The two-dimensional plane, formed by PC1 (30.92%) and PC2 (22.63%), retained a total of 53.55% of the original variance (Figure 1A), which was the highest variance when correlating the most relevant components.
Identification of superior genotypes and traits of soybean through of multivariate analysis and selection index This variance is explained by the following traits: PHM, L, AV, NB, NP, and GY and enabled us to choose the specific genotypes 11; 22;28;30;31;34;36; and 63; and the highly specific characters 20; 29; and 32 for the traits NB, NP, and GY.The highly specific genotypes are those that are well discriminated by one or a few traits and are highly superior since each genotype trait exhibits high influence.The specific genotypes, on the other hand, are those discriminated by several traits.The two-dimensional plane formed by the PC1 (30.92%) and PC3 (14.03%) retained 44.95% of remaining variance (Figure 1B) and the genotypes 12, 13, 61, and 62 were selected and discriminated by the traits GY, WHS, NDM, PHM, and L. Genotypes 12 and 13 exhibited superior GY and WHS, as well as inferior NDM, PHM, and L results, which are important and benign to the selection.Genotypes 61 and 62 exhibited superior GY and PHM, but exhibited and a higher level of L, indicating the intense relationship between PHM and L.
A large group of genotypes was positioned at the center of the graphs, meaning that they exhibit mean values for all the traits, which hinders identification of the more influential traits.Therefore, the trait values are within the range that classifies them as regular (not superior), which also exhibits the homogeneity of that group.
The genotypes located within the established scales are specific for the agronomic traits that influence them (Figure 1A and 1B).On the other hand, regarding the genotypes located in the graph extremities, they exhibit higher performance than the others in some specific traits since they possess very specific traits.Therefore, genotypes 20, 29, and 32 have extremely specific standards for the traits NB, NP, and GY because of their superior phenotypic values.
Figure 1A exhibits dispersion of the traits vectors according to the size proportion of the vectors among each other.Concerning the selected genotypes, the NB and NP traits exhibit a highly positive correlation with GY and are positioned in the same graph quadrant.These results are in agreement with those obtained in other studies (NOGUEIRA et al., 2012;VIANNA et al., 2013).Thus, such traits compose the yield components, i.e., the larger their values the higher the soybean yield of that genotype, which facilitates the selection of superior genotypes with better efficiency.According to the projections of the GY, NDM, PHM, and L traits vectors (Figure 1B), genotypes 12 and 13, besides having the highest soybean yield, also exhibit better precocity, smaller size, and non-lodging shape.
Because of the number of pods, mature plant height is a trait that has a great indirect effect on the productivity of the soybeans.Thus, this trait can be used in genetic enhancement for indirectly selecting soybean yield (ALCANTARA NETO et al., 2011;ALMEIDA;PELUZIO;AFFERRI, 2010).
Figure 1B shows that the WHS trait was most important to good soybean yield since both are strongly associated and positioned in the same graph quadrant.
This can be explained by the fact that the genotypes present higher WHS when the size of the seeds are larger, because of the smaller quantity produced by the plant, since these genotypes suffered almost no influence from the yield components NB and NP.According to Dallastra et al. (2014), the lower the number of pods the lower the number of beans per plant and higher the weight of 100 beans.
Concerning the group of selected genotypes, not all crossings were represented.Therefore, analyzing the parents involved in the crossing, we observed that the female genitors JAB.00-05-5/4A2D and JAB.00-06-2/3I3D were more efficient, with 6 and 5 selected RR genotypes, respectively.Regarding male genitors, M 8230 RR was most effective, with 5 selected genotypes.According to Gonçalves et al. (2014), in order to form segregating populations, selecting the genitors is a crucial step in guaranteeing success in genetic enhancement programs, in which the combination capacity with the presence of complementary genes is largely responsible for the success.
Figure 2 exhibits the dendrogram generated by the Ward cluster analysis.In general, it shows two large groups, named groups A and B. In a rougher selection, through the adopted cutoff point, five subgroups were formed, two of which were within Group A (A1 and A2) and three within group B (B1, B2, and B3).In regard to the subgroups, the genetic difference between the genotypes of these populations was verified based on the studied traits.However, individuals of the same subgroup were considered very similar, while those from different subgroups were considered very distinct.In subgroup A2, only the genotypes selected in the principal components analysis were classified, since they have similar grain yield traits, number of branches and number of pods.The two genotypes selected by main component analysis were grouped in subgroup B1, representing genotypes with greater precocity, besides showing a great influence of a 100-grain variable on crop yield.This is because these genotypes present high similarity for these traits.
Genotypes 28, 61, and 62, from subgroup A1, exhibited similarly high soybean yield as well as mature plant height, and, consequently, elevated lodging rate.On the other hand, genotypes 11 and 34 from subgroups B2 presented similarly low mature plant heights, low lodging rates, and high soybean yield.
Figure 3 exhibits the cluster graph analysis generated by the K-means method, with 85 soybean genotypes for eight agronomic traits, distributed into five groups (clusters).The classification of genotypes is listed in Table 3. Group 4 (Figure 3) was characterized as above average for NDM, PHM, and L, and below average for agronomic values.Furthermore, significantly high values of NB and NP were found for this same group.Thus, the genotypes placed in this group presented the highest GY results from all the groups formed in this analysis.Therefore, this reemphasizes that the NB and NP combination directly determines the yield potential of a soybean genotype.Highly positive phenotypic correlation results between these two traits were obtained in other studies (ALCANTARA NETO et al., 2011;ALMEIDA;PELUZIO;AFFERRI, 2010;LEITE et al., 2016).
Group 5 (Figure 3) was characterized by exhibiting genotypes with values significantly lower than average for NDM, PHM, L, AV, NP, and GY.This genotype group presented the highest genotype precocity and reduced mature plant height, however, when compared to the other groups, its yield was practically the same, with the exception of group 4, which presented the highest productivity.
Genotypes that presented the lowest values for mature plant height are related to a shorter cycle, because of the shorter interval of time in the vegetative and reproductive stages.Thus, the habit of undetermined growth in soybean genotypes is an extremely important trait, since it aims to reach higher plant height when mature, even in more precocious cycle genotypes (MARQUES; ROCHA; HAMAWAKI, 2008).
Groups 4 and 5 exhibited the most traits dispersion and, consequently, greater genetic variability among genotypes.They can be used in new hybridizations to form segregating populations with the objective of uniting, in one individual, precocity, taller mature plants, as well as larger yield component values.Consequently, the plant would exhibit better total soybean yield.The other genotype groups presented smaller variations since the average soybean yield was similar.
The cluster analysis helps the genetic enhancer to choose a specific group of genotypes according to the desired trait, which enables future use of the same genotypes in commercial cultures or even for composing hybridizations.
Similar results from multivariate analyses were obtained by Vianna et al. (2013) while studying the influence of agronomic traits in the selection of genotypes resistant to soybean rust, and by Dallastra et al. (2014), while studying soybean progenies selection in segregating populations descending from RR genitors.Furthermore, Barbosa et al. (2013), concluded that both multivariate cluster and principal components analyses are efficient tools for describing seed lots.
Interestingly, the results obtained from the Ward hierarchical grouping method produced by the dendrogram were similar to the results obtained by the K-means non-hierarchical method (centroid profile).In addition, the grouping analyses results were similar to those obtained by the principal components analysis, because it classified highly specific genotypes, selected for the aforementioned traits, into the same group.This is proof of the similar results and efficiency of the multivariate exploratory techniques in selecting soybean genotypes with good attributes.
The index developed by Mulamba and Mock (1978) was used on advanced generations of enhancement to predict the gains enabled by selection.Selection intensity of 19% was applied with the intention of selecting 16 soybean genotypes, i.e., the same quantity selected by the principal components analysis, thus, comparing the results of both selection methods.
The Mulamba and Mock (1978) sum of the rank index is a good method because it contributes to better  et al. (2012) also confirmed that this index was adequate because it promoted a balanced gain distribution and because it promoted higher total genetic gains by selecting superior genotypes.Bizari et al. (2017) evaluated the selection indexes for the agronomic traits in segregating soybean population and concluded again that the Mulanba & Mock sum of ranks index, using the soybean yield and agronomic value as main estimators, provided better gains in their study than the several other applied indexes.

CONCLUSIONS
1.The principal components analysis enabled us to describe and select 16 Roundup Ready soybean genotypes with agronomic superiority; 2. The traits that make up the yield components (number of branches and pods) exert more influence on soybean yield for being positively associated; 3. The K-means and Ward grouping analyses exhibited similar results since they grouped together the specific genotypes for the traits selected by the principal components analysis and put them into the same group; 4. The multivariate analysis selected the same genotypes as the Mulamba and Mock selection index.

Figure 1 -
Figure 1 -Biplot graph with 85 RR soybean genotypes dispersed in function to the principal components: PC1 x PC2 (A) and PC1 x PC3 (B) and projection of the traits vectors: NDM -Number of day to maturity (days); PHM -Plant height at maturity (cm), L -Lodging (visual score); AV -Agronomic value (visual score); NB -Number of branches (un); NP -Number of pods (un); WHS -Weight of 100 seeds (g) and GY -Grain yield (kg ha -1 )

Figure 2 -Figure 3 -
Figure 2 -Dendrogram representing the group of 85 RR soybean genotypes, obtained by the Ward method from 8 agronomic traits

Table 1 -
Crossings and respective genotypes, the total number of genotypes, number of identified RR genotypes and their codes as a function of the number of identified genotypes per crossing

Table 2 -
Correlation between the agronomic traits and the principal components (PC) of the 85 soybean genotypes with the RR gene in generation F 5 NDM -Number of day to maturity (days); PHM -Plant height at maturity (cm), L -Lodging (visual score); AV -Agronomic value (visual score); NB -Number of branches (un); NP -Number of pods (un); WHS -Weight of 100 seeds (g) and GY-Grain yield (kg ha -1 )

Table 3 -
Optimization grouping between 85 RR soybean genotypes, obtained by the K-means method, based on 8 agronomic traits, using the Euclidian distance

Table 4 -
Leite et al. (2016)ction gain (SG) obtained by theMulamba and Mock method (1978), in F 5 RR soybean genotypes for the traits: NDM -Number of day to maturity (days); PHM -Plant height at maturity (cm), L -Lodging (visual score); AV -Agronomic value (visual score); NB -Number of branches (un); NP -Number of pods (un); WHS -Weight of 100 seeds (g) and GY -Grain yield (kg ha -1 ) X 0 -original average; X s -selected average; h 2 Re -restricted heritability between; D S -selection differential Identification of superior genotypes and traits of soybean through of multivariate analysis and selection index total gains for the evaluated traits regarding soybean yield, thus, this selection strategy was proven to be efficient in genotype selection.According toCosta et al. (2004), the index based on the Mulamba and Mock (1978) sum of ranks was proper and had superior soybean gains in several studied situations.Leite et al. (2016)and Rosado