Microsatellite-assisted backcross selection in maize

A microsatellite marker (SSR) was chosen to simulate a target allele and three criteria (02, 04 and 06 markers per chromosome) were tested to evaluate the most efficient parameters for performing marker-assisted backcross (MAB) selection. We used 53 polymorphic SSRs to genotype 186 BC1 maize (Zea mays L.) plants produced by crossing the inbred maize lines L-08-05 (donor parent) and L-14-4B (recurrent parent). The second backcross (BC2) generation was produced with 180 plants and screened with markers which were not recovered from the first backcross (BC1) generation. A total of 480 plants were evaluated in the third backcross (BC3) generation from which 48 plants were selected for parental genotype recovery. Recurrent genotype recovery averages in three backcross generations were compatible with those expected in BC4 or BC5, indicating genetic gain due to the marker-assisted backcrossing. The target marker (polymorphic microsatellite PHI037) was efficiently transferred. Six markers per chromosome showed a high level of precision for parental estimates at different levels of maize genome saturation and donor alleles were not present in the selected recovered pure lines. Phenotypically, the plants chosen based on this criterion (06 markers per chromosome) were closer to the recurrent parent than any other selected by other criteria (02 or 04 markers per chromosome). This approach allowed the understanding that six microsatellites per chromosome is a more efficient parameter than 02 and 04 markers per chromosome for deriving a marker-assisted backcross (MAB) experiment in three backcross generations.


Introduction
Backcross breeding in maize (Zea mays L.) has been extensively used to transfer favorable alleles for monogenic traits from donor genotypes to elite inbred lines  but high heritability polygenic traits have also been transferred through this method (Rinke and Sentz, 1961;Shaver, 1976). Two problems are inherent in backcross breeding, one being the large number of generations required to recover the genome of the inbred parent and the other being linkage drag, i.e. the presence of the inbred portions of the donor's genome (which can be linked to non-favorable alleles) surrounding the introgressed allele.
As previously described by Tanksley et al. (1989), the main advantage of using DNA markers as opposed to conventional selection is to accelerate the fixation of recipient alleles in non target regions and to identify the genotypes containing crossovers close to target genes (Ribaut and Hoisington, 1998). According to Frisch et al. (1999a), molecular markers are used in backcross breeding for two purposes: (i) as a diagnostic tool to trace the presence of a target allele when direct selection is difficult or impossible, such as the case of recessive alleles expressed late in plant development or quantitative trait loci. The use of markers as a diagnostic tool was first proposed by Tanksley (1983) and reviewed by Melchinger et al. (1990) and the term 'foreground selection' was suggested by and/or (ii) to identify individuals with a low proportion of undesirable genome from the donor parent, this approach being called 'background selection' and was first proposed by Tanskley et al. (1989) and then by Hillel et al. (1990) and was further investigated by Hospital et al. (1992) and later reviewed by Viescher et al. (1996).
Several software programs, such as the PLABSIM program (Frisch et al., 2000), are now available to make selection predictions using simulations. Hospital et al. (1992) investigated the use of markers for the recovery of the recipient genome during an introgression breeding program and showed that if marker assisted introgression is used it should be performed on three generations. These authors also recommended the use of markers with known map position and a density of two or three markers per 100 cM, because increasing this density results in only small benefits. Jarboe et al. (1994) have used the maize genome as a model for simulation and reported that three backcross generations and 80 markers were needed to recover 99% of the recurrent parent genotype.
The efficiency of marker-assisted backcrossing (MAB) selection as a breeding tool has also been evaluated considering the heritability of the target trait Knapp, 1998) and by monitoring the target genomic regions simultaneously as opposed to one by one . Frisch et al. (1999a) also developed an approach to increase the efficiency of MAB in which success would depend on the carrier chromosome, the chromosomal position of the target locus, its distance to the flanking marker loci and the number of individuals evaluated. Performing simulations with the published maize map of 80 markers and phenotypic selection, Frisch et al. (1999b) also found that increasing the population size from the first backcross (BC 1 ) generation to the third backcross (BC 3 ) generation reduced the number of marker data points by as much as 50% without affecting the recurrent parent genotype proportion. Ribaut et al. (2002) presented simulations using different strategies using the maize genome as a model to compare allelic introgression with DNA markers through backcrossings. Ragot et al. (1994) demonstrated that MAB could be efficiently used for introgressing a transgene construct containing the Bt-gene (the Bacillus thuringiensis toxin gene) of a transformed parent in an elite maize inbred, these workers reaching the same level of parent genotype recovery in BC 3 as that expected for the sixth backcross (BC 6 ) generation. Stuber (1994) using previously mapped favorable quantitative trait loci (QTLs) from two inbred lines and successfully transferred them to other inbred lines lacking these QTLs. Single genes with large effects conferring resistance to bacterial blight in rice have also been transferred using marker-assisted selection (Huang et al., 1997;Sanchez et al., 2000).
When evaluating the length of the donor segment with the Vrn-B1 locus of first and third generation sister wheat lines Salina et al. (2003) concluded that if the selection is not molecular marker assisted plants with a long donor segment linked to the target gene may be randomly selected. The objective of our research was to evaluate a maize marker-assisted backcrossing scheme with emphasis on ascertaining how many markers per chromosome should be used and the number of generations in which the genotype of the parental line could be recovered. This research was carried out to verify the differential efficiencies of the number of markers per chromosome in background assisted backcross selection in maize by introgressing a target microsatellite marker.

Plant material and backcrossing generations
The donor parent was the early-maturing orange flint kerneled inbred maize line L-08-05 (S 7 ; parent 1, coded as P 1 ) derived from line IG-1 and the recurrent parent was the early-maturing yellow dent kerneled inbred maize line L-14-4B (S 5 ; parent 2, coded as P 2 ) derived from line BR-106. These populations were allocated into different heterotic groups (Naspolini Filho et al., 1981;Souza Jr et al., 1993).
Only one F 1 ear was chosen from the P 1 x P 2 cross. The BC 1 generation resulted from the backcrossing of the F 1 generation with the L-14-4B (P 2 ) recurrent parent. We genotyped 186 BC 1 plants and selected 12 progenies, four plants for each selection criterion (SC) which were based on the number of markers per chromosomes (SC 1 = 2, SC 2 = 4 and SC 3 = 6). The second backcross (BC 2 ) generation was derived from the cross of selected BC 1 plants to P 1 plants and 180 BC 2 plants genotyped, of which we selected 15 plants per progeny/selection criterion to give a total of 180 plants. In the BC 3 generation we genotyped 480 BC 3 plants and selected ten plants per progeny/selection criterion, reaching a total of 160 plants per selection criterion. Of the 480 plants genotyped, four plants were selected per progeny/selection criterion, making a total of 16 plants per selection criterion and a total of 48 selected lines.
The differences in the number of plants evaluated in each selection cycle (BC 1 to BC 3 ) were due to the need to adapt the MAB scheme to practical laboratory condition. Since each PCR-plate supported 96 DNA samples and each microsatellite locus needed P 1 , P 2 and F 1 control profiles there were 93 empty wells for each plate and two PCR plates were needed for 186 samples, representing a total of 106 amplifications in the first selection cycle (BC 1 ). The second cycle (BC 2 ) was made up of the fifteen samples selected from each criterion (SC 1 to SC 3 ), which represented a total of 60 plants per selection criterion and 180 plants genotyped for the BC 2 generation. In the third selection cycle (BC 3 ) the number of plants selected per criterion was reduced from fifteen to ten due to the amount of DNA samples to be manipulated. In all, 480 DNA samples were amplified with the non-recovered microsatellites.

Molecular selection
As outlined above, our molecular approach for selecting plants in each backcross generation was based on three selection criteria which differed in respect to the number of markers per chromosome (SC 1 = 2, SC 2 = 4 and SC 3 = 4). More than 250 microsatellites (synthesized by GIBCO BRL, São Paulo, SP) were screened and used to genotype the BC 1 generation but only 53 were shown to be polymorphic and thus used in the final analysis. Besides their use for polymorphic profiling, markers were also chosen for their map position and distribution along the chromosomes. Maize genome has been dissected into 100 evenly spaced "bins" of approximately 20 cM each (Gardiner et al. 1993).
For SC 1 one marker per chromosome arm was chosen (maintaining a standard distance from the centromere to increase recombination and decrease linkage drag) while for SC 2 we chose two markers per chromosome arm and for SC 3 , approximately three markers per chromosome arm were selected (Table 1). Molecular criteria were used to compare the recovery of the recurrent parent genotype and the effectiveness of the marker number increase per chromosome used during the selection process.
Microsatellite-assisted backcross selection in maize 791 A polymorphic microsatellite, PHI037 (bin 1.08), was elected and used as a target locus to be introgressed. Apart from the reliable pattern, PHI037 also presented polymorphic flanking microsatellites (PHI011, at bin 1.09 for SC 1 /SC 2 ; BNLG615, at bin 1.07 for SC 3 ) that could contribute to diminishing the linkage drag around the locus. Every plant selected satisfied the following conditions: (1) the introgressed marker was heterozygous; (2) the plant showed the highest recovery percentage of the recurrent parent genotype, (3) no donor allele was present in the genotyping of the individual plant.
DNA extraction followed the procedures described in Hoisington et al. (1994). Each amplification was carried out using DNA from the donor (P 1 ) and recurrent (P 2 ) parent and the F 1 generation. Only the non-recovered microsatellites at BC 1 were used to screen the BC 2 and BC 3 generations. The 48 BC 3 selected plants (BC 3sel ) were analyzed with microsatellites not used in previous generations, making total of 72 microsatellites spaced at an average of 25 cM (Table 1). The SC 1 and SC 2 BC 3sel plants were also screened with microsatellites used in genotyping for the other criterion. Reactions were carried out using the Touchdown PCR (Don et al., 1991). Products were separated by 4% (w/v) 1:1 metaphor:agarose gel electrophoresis.

Molecular data analysis
Results were scored as A for the P 1 allele (coded as Genome donor, GD), B for the P 2 allele (coded as Genome recurrent, GR) and H for the F 1 pattern of P 1 and P 2 alleles. The percentage of the P 2 recurrent parent genome present in each genotyped plant was estimated using the expression GR% = [B+ (0.5H)/(B+H+A)] x 100, corresponding to the number of B alleles present in the genotype of each genotyped plant. The introgressed locus was taken from the computation of the total recoveries of recurrent parent genotypes.
The BC 1 and BC 2 percentage recovery means were compared to the expected averages for each backcross generation. The 48 selected BC 3 plants were analyzed considering saturating (BC 3sel sat ) and non-saturating (BC 3sel non-sat ) conditions. To compare the recovery percentages between the selected plants the data were transformed using the arcsine function. For each selection criterion t-tests were performed to compare the percentage recoveries of the P 2 parent BC 3 lines under saturated and non-saturated conditions for an expected average of 93.75% and a saturated average of 100%.

Experimental evaluation and statistical analysis
The 48 BC 3 selected lines were evaluated during the 2002/2003 planting season near the city of Piracicaba in the Brazilian state of São Paulo at three experimental stations (Department of Genetics, Areão, and Caterpillar) belonging to São Paulo State University (ESALq/USP). We evaluated 49 entries (48 selected BC 3 lines plus the P 2 parental line) using a 7 x 7 lattice design with three replicates per lo-cation. Plots were 4 m long with 0.80 m between rows and 0.20 m between plants within rows and were over-seeded with 40 seeds per plot, later thinned to 20 plants per plot (equivalent to 62,500 plants ha -1 ). Data for grain yield (GY, grams per plant), number of kernel-rows per ear (NKE), number of rows per ear (NRE), ear length (EL), ear diameter (ED), plant height (PH), ear height (EH), days to silk extrusion (DSE), days to pollen shed (DPS), number of plants per plot (stand) and grain moisture were recorded in all environments. The NKE, NRE, EL, ED, PH, and EH traits were recorded on five competitive plants per plot and the plot means were used for analyses. The DSE and DPS traits were recorded in each plot as days from planting to the time when 50% of plants per plot showed this trait. Grain yield of each plot was adjusted for average stand by covariance analyses, and for 15% grain moisture before analyses.
Analysis of variance was computed for each location and, subsequently, joint analysis of variance was computed across locations for each trait. The BC 3 Line source of variation was partitioned into the three selection criteria (SC 1 , SC 2 and SC 3 ) and among selection criteria (ASC), and the interaction BC 3 lines x Location (BC 3 Lines x L) was partitioned accordingly (SC 1 x L, SC 2 x L, SC 3 x L and ASC x L). The Dunnett test was used to test whether the traits analyzed from each BC 3 selected line differed significantly or not from the P 2 recurrent parental inbred (L-14-4B). All analyses were performed using the SAS software, proc GLM (SAS Institute, 1989). Unless otherwise indicated all stated significances were at p £ 0.01.

Results
The three molecular criteria showed an average of 74.3% recovery of the P 2 genotype for the first backcross generation. This average did not differ from the expected value (75%); little variation was detected among the three criteria with the maximum value being 87.5%. The average value of the four selected plants in the BC 1 generation was 83.8% for SC 1 , 83.1% for SC 2 and 82.7% for SC 3 (Figures  1a, 1b and 1c respectively). In the BC 2 generation, the three criteria showed an average of 91.2% and maximum value of 98.1%. The mean expected value for this generation, without selection, was 87.5%. The average value of the 16 selected plants in the BC 2 generation was 94.4% for SC 1 , 92.5% for SC 2 and 92.1% for SC 3 (Figures 1d, 1e and 1f respectively). In the BC 3 generation, the three criteria showed an average of 96.6%, which was significantly higher than the expected mean (93.75%) for this generation. The maximum value was 100% (without considering the PHI037 allele). The average value of the 16 selected plants in the BC 3 generation was 97.3% for SC 1 , 96.2% for SC 2 and 96.4% for SC 3 (Figures 1g, 1h and 1i respectively).
The presence of alleles derived from the donor parent was not detected in the non-saturated condition. In the selected saturated BC 3 lines, some donor alleles were de-792 Benchimol et al. tected in bins that had not been assayed previously for selection (sat primers, Table 1). The average number of donor alleles were reduced with the increment of markers per chromosome used for selection, i.e. 0.75 ± 0.25 for SC 1 , 0.44 ± 0.16 for SC 2 , and 0.13 ± 0.09 for SC 3 . These results expressed the average frequency of donor alleles per plant and their associated mean errors. Although the BC 3sel sat and BC 3sel non-sat conditions considered plants with the same genetic background, the genome saturation estimates of the genetic content for P 2 parent recovery were different. The 16 selected BC 3 lines in the three selection criteria were saturated with more markers, and the mean values for P 2 parent recovery showed that the averages for the selected saturated BC 3 lines were significantly lower for SC 1 and SC 2 but were not significant for SC 3 . There was a large and significant decrease in P 2 genotype recovery for SC 1 (from 99.8% to 92.4%) and for SC 2 (99.1% to 95.5%) with genome saturation. The increase in the number of markers was also significantly reduced the difference between the averages and standard deviations for BC 3sel non-sat and BC 3sel sat . All selection criteria, except BC 3sel non-sat and BC 3sel sat for SC 1 , were significantly superior to the expected mean value (93.75%) and significantly inferior to the maximum value (100%). The value for SC 1 /BC 3sel non-sat did not differ significantly from the maximum value (100%) whereas SC 1 /BC 3sel sat was significantly Microsatellite-assisted backcross selection in maize 793 Figure 1 -Distribution of P 2 recurrent parent recovery for the three selection criteria (SC 1 , SC 2 , SC 3 ) for the three backcross generations (BC 1 , BC 2 and BC 3 ).
inferior to the expected mean value (93.75%) for this generation. It should be emphasized that the SC 3 criterion differed significantly from SC 2 and SC 1 for BC 3sel sat (Table 2). No significance was detected between different selection criteria for nine phenotypic traits in the variance analyses and the among selection criteria (ASC) level was insignificant for all nine traits. For the BC 3 lines, significance was detected for most of the traits (Table 3). Therefore, the analysis of the phenotypic data did not detect differences among the three selection criteria, but there were significant differences between the BC 3 lines within each selection. 794 Benchimol et al. §: Significant at the 0.05 probability level for 93.75%, and significant at 0.01 probability level for 100%. The 48 selected BC 3 lines were compared with the P 2 parent line using Dunnett's test (Table 4) without Bonferroni's correction for the probability level (Zar, 1999). For each selection criterion, 120 tests were performed and 88 tests (73.3%) were significant for SC 1 , 52 (43.3%) were significant for SC 2 and 8 (6.7%) were significant for SC 3 . In relation to the SC 1 criterion, all BC 3 lines differed significantly from the P 2 parent for at least one trait, and six lines differed significantly for nine traits simultaneously while the other ten lines diverged significantly for at least three traits simultaneously. For the SC 2 criterion all BC 3 lines differed significantly from the P 2 parent in at least one trait, with four lines differing simultaneously for all characters and the other 12 lines differing significantly for at least two traits simultaneously. However, for the SC 3 criterion, ten of the sixteen lines did not differ significantly from the P 2 parent for any of the traits, although six lines did show significant differences for two characters simultaneously but none of these lines differed significantly for the PH, EH, DSE or DPS traits, the same behavior being observed in the analysis of variance (Table 3) for these same four traits as regards the SC 3 criterion.

Discussion
The plant breeding community is enthusiastic about marker-assisted selection, but a link between theory and practice is still missing. Parameters are defined and simulations represent suitable approaches providing a strong contribution to the goal of using marker-assisted backcrossing (MAB) in crop improvement (Hospital et al., 1992;Tanksley and Nelson, 1996;Visscher et al., 1996;Knapp, 1998;Frisch et al., 1999a;Frisch et al., 1999b;Frisch et al., 2000;Reyes-Valdés, 2000). However, most of the theoretical papers related to marker-assisted selection present complex mathematical models, making it difficult to directly derive a practical experiment. Differences in laboratory strategies such as different DNA markers are rarely taken into account in theoretical papers (Ribault et al., 2002) and the practical implications of the use of different marker technologies should be considered to achieve selection. Newer and simpler DNA marker systems have been developed and even the technological constraints of early restriction fragment length polymorphism (RFLP) methods seemed to have been overcome (Ribaut et al., 1997), with high-density DNA marker maps having been constructed for almost every important crop species (OBrien, 1993). According to Young (1999), more than 400 articles containing the key words 'marker-assisted breeding' or 'marker-assisted selection' can be found in the 1995 to 1999 Current Contents, yet few if any, actually describe the use of marker-assisted technology leading to the release of varieties and few give experimental results validating the efficiency of markerassisted methodologies in different crop species (Ragot et al., 1994;Stuber, 1994;Salina et al., 2003;Lecomte et al., 2004).
Sometimes the most efficient strategy may lose its clarity as different theoretical approaches can serve the same purpose. Population increase, when advancing the backcross generations, reduces the number of markers needed in contrast to a constant population size across all generations (Frisch et al., 1999b), while another way of reducing the number of markers would be to increase the number of markers in each new backcross generation (Hospital et al., 1992). In the study described in this paper the strategy adopted was to maintain the population size in the first and second backcross generations but increase the number of plants in the third backcross generation. The number of markers in the first scheme was reduced to the non-target locus at each generation as only the nonrecovered plants were screened in the following generation. Other markers were used to saturate the genome of the selected genotypes obtained from the BC 3 selection.
In order to obtain an economical and viable strategy only a few markers were used to assist selection in the BC 1 and BC 2 generations but from the BC 3 generation other markers were added for the evaluation of the recovered lines and in the end the spacing between markers reached 25 cM across the ten maize chromosomes. Frisch et al. (1999a) andJarboe et al. (1994) both used the published maize map of 80 markers for their simulation and found promising results. As has been previously demonstrated (Hospital et al., 2002), response to selection would be reduced if a marker from an unknown chromosomal location had been used. Our approach demonstrated that microsatellites could be used efficiently for introgressing a target Microsatellite-assisted backcross selection in maize 795 allele, simulating a monogenic trait, without any intermediate field selection. The idea of using a marker was to control all the steps for the introgression of the target locus from the donor to the recipient line, this approach being neutral and simple to monitor, although, as in a simulation, no target trait could be followed. All the conditions were adjusted so that they would be practical and fit into the daily laboratory routine. The P 2 recurrent genotype recovery averages in three backcross generations were compatible to the those expected for the fourth and fifth backcrosses (BC 4 and BC 5 ) which shows that marker-assisted backcrossing produced a genetic gain as regards P 2 recurrent parent recovery, this result agreeing with previously published work (Hospital et al., 1992;Frisch et al., 1999a;Ragot et al., 1994;Jarboe et al., 1994). Hospital et al. (1992) have shown that increasing the number of markers to more than three per non-carrier chromosome was not efficient in early generations, but Ribault et al., (2002) have pointed out that because there is an increased probability of crossover in later generations an increase in the number of markers should be considered as a way of optimizing selection.
The approach outlined in this paper was able to transfer the target allele, the efficiency of transfer at both the molecular and phenotypic level being higher when six markers per chromosome were used. The use of six markers also resulted in a greater level of precision for P 2 recurrent genotype recovery estimates under different conditions of maize genome saturation. Alleles derived from the P 1 donor parent were not detected under non-saturated conditions with six markers per chromosome while under saturated conditions; donor alleles were still detected in unassayed bins for those plants that had been selected using a low number of markers per chromosome.
When the number of markers per chromosome arm was increased there was a reduction in the number of significant tests, indicating that increasing the number of markers during marker-assisted backcrossing resulted in plants that were closer to the P 2 recurrent parent. The SC 3 criterion (six markers per chromosome) was shown to be superior to the SC 1 and SC 2 criteria in that it presented the lowest number of significant tests and the fewest plants with significant tests for more than one trait simultaneously. At the phenotypic level a high number (i.e. 6) of markers per chromosome increased the efficiency of MAB in such a way that the final SC 3 /BC 3 selected plants were closer to the P 2 parental type than to any of the plants selected using the SC 1 and SC 2 criteria.
This practical approach could be fully optimized by associating field selection for enhanced parental genotype recovery evaluation with a specific transferred phenotypic trait. We agree with the observation by Ribault et al. (2002) that the application of a backcross marker-assisted strategy for practical experiments should be on a case-to-case basis and that it is important to consider the nature of the germplasm involved.