Genetic divergence in soybean parents for backcrossing programs

To evaluate superior hybrid combinations, biometric techniques based on the quantification of heterosis as in the diallel analyses and predictive procedures are commonly being used. Diallel analyses have the disadvantage of requiring the evaluation of each parent in all possible combinations, which may be impossible in the case of studies involving many parents. The predictive methods which dispense with the previous establishment of the hybrid combinations have attracted interest. These methods base upon the morphological, physiological and

molecular differences, among others, which the parents present in the evaluation of divergence, generally quantified by a measure of dissimilarity.The choice of the most adequate method is determined in function of the desired precision, of the easiness of analysis and by the way the data were obtained (Cruz 2001).
When molecular genetic maps first came up (Bernatzky et al. 1986, Shoemaker et al. 1992, Morgante et al. 1994, Akkaya et al. 1995, Shoemaker and Specht 1995), map-based molecular markers became a powerful tool in genetic improvement.The term molecular improvement is used to describe improvement programs that are assisted Genetic divergence in soybean parents for backcrossing programs by DNA-based diagnostics.One of the most commonly used classes of markers in improvement is the class of microsatellites or SSRs (Simple Sequence Repeats) which are sequences of 1 to 6 pb repeated across the entire genome.SSRs are PCR-amplified with complementary primers to the conserved regions that flank the SSR region.The polymorphism of the fragments (alleles) results in variations in the length of the SSR replications which are separated in agarose gel or polyacrylamide (Narvel et al. 2000).Owing to their vast distribution across the chromosomes, the SSR markers allow a more in-depth evaluation of the entire genome, resulting in an enhanced monitoring of the gene introgression during the backcrossing process.Besides, these markers provide support for the selection of base populations for improvement programs, generating information for genotype clustering and crossing schedules (Brown-Guedira et al. 2000).Depending on the number of evaluated plants, the estimates of genetic distances are somewhat unmanageable while the evaluation of relationships among plants hampers the work with selection.To solve this problem, one can use statistical procedures such as clustering analysis, construction of dendrograms, estimate of principal components or multidimensional scaling (Santos 1994).Based on the underlying genetic dissimilarities which were determined by the microsatellite analysis, estimated by the coefficient of coincidence simples and grouped by the UPGMA, Single linkage and Tocher methods, our study had the objective to select the soybean parents genetically closest to each other, thus aiming at the introgression of high-protein alleles into elite soybean varieties by backcrossings.

Plant material and crossings
Three varieties and six lines of soybean from the Cooperativa Central Agropecuária de Desenvolviment Tecnológico e Econômico Ltda -COODETEC (Cooperative Center for Technological and Economical Development in Agriculture and Animal Husbandry) were used as recurrent parents (Table1).Eighteen high-protein lines (47 to 52%) of the soybean breeding program of the BIOAGRO/UFV (Table 1) were used as donors of the trait high protein content/concentration with the objective of incorporating alleles for the trait in question into the COODETEC varieties.After determining the proteins in the seeds, the donors were selected according to the Kjeldahl method to quantify the total nitrogen described by the Association of Official Analytical Chemists (AOAC 1975) with modifications.The genotypes were fingerprinted to define the closest ones in order to accelerate the backcrossing program.

DNA extraction and purification
DNA samples were extracted by the method described by McDonald et al. (1994) with modifications.To 50 mg seeds 1000 µL of extraction buffer were added containing Tris-HCl 0.2 M, NaCl 0.28 M, 25 mM EDTA and 10% SDS, extracted in Polytron and centrifuged immediately for 10 min at 14.000 rpm.The supernatants were transferred to new tubes, adding 10 µL proteinase K 10 mg mL -1 and 10 µL CaCl 2 1 mM, and immersed in a 55 ºC water bath for 30 min.The samples were then supplemented with 900 µL isopropanol and left to settle for 2 min.After this time the samples were centrifuged for 10 min at 14.000 rpm.The supernatants were discarded, the precipitate was dried for 15 min at room temperature and resuspended in 10 mM Tris-EDTA solution and 1 mM with 60 µg mL -1 RNase and immersed in a water bath for 90 min.The samples were precipitated again by adding 900 µL isopropanol and left to settle during 2 min.Thereafter, the samples were centrifuged once more for 10 min at 14.000 rpm and the supernatants discarded.The precipitates were finally resuspended in TE (10 and 1 mM).The DNA concentration was estimated spectrophotometrically by a reading of the absorbance at 260 nm (each absorbance unit corresponded to 50 µg mL -1 double-stranded DNA (Sambrook et al. 1989).

Amplification reactions
The microsatellites were amplified in a total volume of 15 µL, containing 12.5 µM Tris-HCl (pH 8.3), 62.5 µM of KCl, 2.5 mM of MgCl 2 , 125 µM of each one of the deoxynucleotides (dATP, dTTP, dGTP and dCTP), 0.6 µM of each primer, one unit of the enzyme Taq-polymerase and 18 ng DNA.The amplifications were performed in a Perkin Elmer 9600 thermal cycler programmed for an initial phase of 7 min at 72 ºC, followed by 30 cycles of 1 min at 94 ºC, 1 min at 50 ºC and 2 min at 72 ºC plus a final step of 7 min at 72 ºC.The amplified fragments were separated by 10% nondenaturant polyacrylamide gel electrophoresis during four hours at 120 V, and stained with ethidium bromide solution (0.2 µg mL -1 ).After the electrophoresis, the gels were photographed by the digital Eagle Eye II (Stratagene) image system.

Estimators of genetic distances
The genetic distances between the parents were estimated based on the data of the analyses with the microsatellites, using the complement of the simple coincidence (SC) coefficient as measure of dissimilarity.The SC between two plants was obtained by dividing the total of microsatellite loci containing common alleles by the total number of evaluated loci.The data matrix was established attributing value 2 to the microsatellite locus with two copies of the homozygous dominant allele, value 1 to each allele when the locus was heterozygous, value 0 when the locus had two copies of the recessive homozygous allele and value 9 when the reaction failed.

Clustering methods
Three clustering methods were used for an enhanced visualization of the groups for crossings, since, depending on the number of evaluated genotypes, the matrix of genetic distance does not allow a general vision of the relationship among the genotypes.These were the agglomerative, UPGMA (unweighted pair group method with arithmetic mean) and Single linkage methods, and to obtain mutually exclusive groups the optimization method of Tocher.

RESULTS AND DISCUSSION
The studies realized into genetic divergence in plants are generally related to the objectives of characterizing varieties (Priolli et al. 2002), seeking information on the genetic constitution of germplasm (Bao- Rong L 2004) and having access to the variability in the crops (Kudryavtsev et al. 2004), and, from a practical point of view, used to verify the genealogies, in the planning of crossings with superior genotypes and the exploitation of heterosis.
The use of microsatellite markers (SSR) for the characterization of plant genotypes has become quite common in breeding programs (Narvel et al. 2000, Tanya et al. 2001).Figure 1 presents the amplification products of primer Satt181 in 10% polyacrylamide gel.Table 2 shows the distribution of the 57 SSR pairs selected in the present study across the linkage groups according to information from the integrated map of soybean (Cregan et al. 1999), four of which were not mapped.Eighteen of the 24 linkage groups of the integrated map for soybean were mapped indicating that the genome of the soybean was well sampled by the SSR markers.In this sense, Hospital et al. (1992) observed that molecular markers are useful in gene introgression programs provided that they are well distributed across the genome and their positions known.
The genetic distances, when using the complement of the coefficient of coincidence simples (SC), varied from 0.08 to 0.74, demonstrating the great capacity of SSR markers to evaluate the genetic diversity in soybean.Using 12 SSR markers, Priolli et al. (2002) observed genetic distances which varied, in the mean, from 0.4 to 0.9 among 186 Brazilian soybean cultivars.The smallest genetic distance (0.08) was found between two sibling donor lines  for high-protein alleles CD 203PT30-1(3) and CD 203PT30-3 (4).The largest genetic distance (0.74) was observed between a donor line of high-protein alleles CD 203PT30-3 (4) and the variety with normal protein content CD 983321 (15).There is a large genetic variability among the donor parents of high-protein alleles (0.08 to 0.70).This result is important since it indicates that these parents, besides representing excellent sources of high-protein alleles, have a great genetic diversity, which is desirable in parents for allele introgression by means of backcrossings.A large genetic variability (between 0.28 and 0.72) was also observed among the recurrent parents.
The matrix of dissimilarity was used for the cluster analyses by the agglomerative UPGMA and Single linkage methods.A point of significance for the dendrograms of the Single linkage of a linkage distance of 0.40 was defined (Figure 2), and of 0.50 for UPGMA (Figure 3).Next, the groups formed in each dendrogram were established according to the given limit.The dendrogram obtained by the Single linkage method (Figure 2) allowed the formation of six groups with the following constitutions: Group 1 formed by 14 parents; Group 2 by 7; Group 3 by 3; Group 4 by 1 parent; Group 5 by 1 parent and Group 6 formed by 1 parent.The dendrogram obtained by the UPMGA method (Figure 3) united the parents in 5 distinct groups as discriminated in the following: Group 1 formed by 10 parents; Group 2 by 3; Group 3 by 3; Group 4 by 4; and Group 5 formed by 7 parents.
To obtain mutually exclusive groups, the optimization method of Tocher was used which identified nine groups (Table 3): group 1 formed by 7 parents, group 2 by 6; group 3 by 3; group 4 by 3; group 5 by 3; group 6 by 2; group 7 by 1 parent; group 8 by 1 parent, and group 9 formed by 1 parent.A comparison of the three clustering methods (Table 3) demonstrated that the Tocher method was the one that differentiated the largest number of groups, subdividing group 1.Nevertheless, the three methods agreed in the clustering of most of the analyzed genotypes; the Single linkage was closer to Tocher than UPGMA.
Similar results were observed by Tanya et al. (2001), who managed to unite 16 soybean varieties (genotypes) in 5 distinct groups using 20 SSR markers by the UPGMA method.In the same sense, Baranek et al. (2002), using RAPD markers, succeeded in uniting 19 soybean genotypes evaluated in five distinct groups, using the coefficient of Nei and the UPGMA clustering method.The donor lines with a common origin mostly grouped in the same cluster, as was the case of the donor lines CD 202PT (6, 7, 8 and 9) and CD 203 PT (3, 4, and 5), while the donor lines derived from OC 671 PT (10, 11 and 12) grouped into another cluster.The case of some sibling lines which grouped in different clusters, as for instance the lines derived from B1PTA (25, 26 and 27), can be explained by the reduced number of backcrossings, three, (without realizing fingerprinting of the backcrossed plants) which was insufficient to recover the genome of the recurrent parent of around 93.75%.The discussions on the recovery of the genome of the recurrent parent during the backcrossing generations has focused on the proportional expected values of the recurrent parent without taking the existing variation around the expected mean value into consideration (Openshaw et al. 1994).In this sense, the SSR markers (provided that they are well-distributed) can define the genetic load of the genotypes with more accuracy an increase the effectivity of the backcrossing method.
The donor parents for high-protein alleles were selected based on their genetic dissimilarities, obtained from the matrix of genetic distance with the recurrent parents (Table 4).As the table shows, between 3-5 donor parents with genetic distances varying from 0.28 to 0.57 were selected for crossings with each recurrent parent.The maintenance of the genetic diversity with more than one donor for high-protein alleles is extremely important.In spite of being a trait controlled by few genes, it is a complex trait and extremely influenced by the environment, indicating the action of smaller genes on the trait.The numbers in brackets in Table 4 correspond to the codes assigned to the genotypes in Table 1.Thus, five donor parents (distances from 0.28 to 0.48) were selected for the crossing with CD 983321 (15), four donors (distances from 0.31 to 0.56) for CD 985015 ( 16), three donors (distances from 0.49 to 0.51) for OC 953194 (18), three donors (distances from 0.37 to 0.52) for CD 211 (19), four donors (distances from 0.43 to 0.47) for CD 210 (20), four donors (distances from 0.41 to 0.47) for OC 953006 (21), four donors (distances from 0.41 to 0.43) for OC 953312 (22), four donors (distances from 0.48 to 0.54) for CD 983343 (23), and four donors (distances from 0.42 to 0.57) for CD 204 (24).Faleiro et al. (2004) used RAPD markers to assist a backcrossing program that aimed at the introduction of rust and anthracnose-resistance genes into common bean.The DNA fingerprinting of the resistant plants was used to select the genetically closest plants to the recurrent parent.The genetic distances in relation to the recurrent parental of the resistant plants varied from 9 to 59% for BC 1 , from 33% for BC 2 and from 0 to 7% for BC 3 .Five rust and anthracnose-resistant lines were obtained after only three backcrossings with a genetic distance of approximately 0% in relation to the recurrent parental.
The next stage in the backcrossing program will be the evaluation and selection of BC 1 F 2 plants for high protein content/concentration and fingerprinting of these plants for a later selection of the genetically closest to the recurrent parent.This way, our study aims to obtain highprotein soybean lines with a smaller number of backcrossings, oriented by SSR markers in fingerprinting.Palavras-chave: fingerprinting, microssatélites, distâncias genéticas, marcadores moleculares.
Genetic divergence in soybean parents for backcrossing programs Table2.Distribution of the 57 SSR markers selected for the analysis of genetic divergence in the linkage groups of the integrated map of soybean of the USDA, University of Utah and University of Nebraska (fromCregan et al. 1999)

Figure 1 .
Figure 1.Amplification products of primer Satt181 in the 27 parents in 10% polyacrylamide gel.The last column on the right represents the pattern of molecular mass of pUC18 plasmid digested with enzyme Msp I

Figure 2 .
Figure 2. Single linkage containing the 27 parents used in the backcrossing program.RP indicates a recurrent and DP a donor parent

Figure 3 .
Figure 3. UPGMA containing the 27 parents used in the backcrossing program.RP indicates a recurrent and DP a donor parent

Table 3 .
Comparison between the clustering methodsIndividuals that are common to the three clusters appear in bold