Accuracy of Across-Environment Genome-Wide Prediction in Maize Nested Association Mapping Populations

Most of previous empirical studies with genome-wide prediction were focused on within-environment prediction based on a single-environment (SE) model. In this study, we evaluated accuracy improvements of across-environment prediction by using genetic and residual covariance across correlated environments. Predictions with a multienvironment (ME) model were evaluated for two corn polygenic leaf structure traits, leaf length and leaf width, based on within-population (WP) and across-population (AP) experiments using a large maize nested association mapping data set consisting of 25 populations of recombinant inbred-lines. To make our study more applicable to plant breeding, two cross-validation schemes were used by evaluating accuracies of (CV1) predicting unobserved phenotypes of untested lines and (CV2) predicting unobserved phenotypes of lines that have been evaluated in some environments but not others. We concluded that (1) genome-wide prediction provided greater prediction accuracies than traditional quantitative trait loci-based prediction in both WP and AP and provided more advantages over quantitative trait loci -based prediction for WP than for AP. (2) Prediction accuracy with ME was significantly greater than that attained by SE in CV1 and CV2, and gains with ME over SE were greater in CV2 than in CV1. These gains were also greater in WP than in AP in both CV1 and CV2. (3) Gains with ME over SE attributed to genetic correlation between environments, with little effect from residual correlation. Impacts of marker density on predictions also were investigated in this study.


KEYWORDS
best linear unbiased prediction GenPred genetic correlation maize ridge regression shared data resources Traditional quantitative trait loci (QTL)-based prediction (QP) in marker-assisted selection (MAS) is defined as a two-step process to predict breeding values (BVs) of untested lines for traits of interest based on QTL identified from a typical biparental plant breeding population. In the first step, QTL are identified based on phenotypic and genotypic data via various QTL mapping methods such as stepwise regression (Lande and Thompson 1990), interval mapping (Lander and Botstein 1989;Haley and Knott 1992), composite in-terval mapping (CIM) (Jansen 1993;Zeng 1993Zeng , 1994Wang et al. 1999), and inclusive CIM (Li et al. 2007). In the next step, identified QTL passing a particular significance threshold are included in a single model, and the combined effects of all QTL alleles are estimated simultaneously by either maximum likelihood or multiple linear regression-based methodologies (Edwards and Johnson 1994;Meuwissen et al. 2001;Bernardo and Yu 2007). These effects are then used in MAS schemes to predict BVs of untested individuals based solely on genotypic data in a subsequent off-season nursery or green house selection (Lorenzana and Bernardo 2009). The accuracy of QP is contingent on the numbers of QTL identified and their respective estimated effects. Although it is relatively easy to detect QTL with large phenotypic effects, it is difficult to identify QTL with intermediate or small effects (Lander and Botstein 1989;Xu 2003). This difficulty may be attributed to limited sample sizes of mapping populations and low heritabilities of target traits. Lack of sufficient replication and precision phenotyping also was indicated as possible reasons for low power of QTL mapping. Regardless, the power of QTL discovery determines the prediction accuracy during MAS and is responsible for the concomitant decrease in overall MAS efficiency. It is well known that QTL discovery in biparental mapping populations with a limited number of progeny (, 300) tends to overestimate individual QTL effects, regardless of statistical methodologies for effect assignment (the Beavis effect, see Beavis 1994). In severe situations, QTL effects cannot be estimated when genotypes of QTL are highly correlated or when population sizes are too small relative to the total number of putative QTL considered in the model. Therefore, from the perspective of statistical genetics, QP has focused on two key aspects: improving QTL mapping power and increasing the accuracy of QTL effects estimation. Although many statistical approaches have been developed for both purposes, improvement of QP accuracy for quantitative traits controlled by numerous small-effects QTL remains a current problem (Bernardo 2001;Heffner et al. 2009).
Simulation and empirical studies have shown that genome-wide prediction (GWP) (Meuwissen et al. 2001) provides improved accuracy in predicting BVs of untested lines over QP by the use of genomic marker trait associations in plants (Piyasatian et al. 2007;Lorenzana and Bernardo 2009;Zhong et al. 2009;Crossa et al. 2010;Heffner et al. 2011;Guo et al. 2012;Riedelsheimer et al. 2012;Zhao et al. 2012), animals (Lee et al. 2008;Legarra et al. 2008;Hayes et al. 2009a;Luan et al. 2009;Moser et al. 2009;Rolf et al. 2010;Mujibi et al. 2011;Wolc et al. 2011), and humans (Makowsky et al. 2011). With GWP, previously unidentified QTL with small effects in QP are captured in generated predictive models. This was shown to lead to significant increases of accuracies of derived predictions. In addition, to reduce the inflation of effect estimation of QTL, effects of each marker are treated as random draws from a common prior density in GWP. In practice, these assumptions of a prior probability distribution are leveraged to limit large fluctuations in marker effect estimates, inducing shrinkage estimates of marker effects. While ridge regressionbest linear unbiased prediction (RR-BLUP), BayesA and BayesB approaches originally were developed to improve assessments of QTL effects for GWP strategies (Meuwissen et al. 2001), more attention has been directed toward extending and improving these methods to accommodate various types of study populations and to improve predictive ability (Lee et al. 2008;de los Campos et al. 2009de los Campos et al. , 2010Meuwissen et al. 2009;Piepho 2009;Hayes et al. 2009b;Hayashi and Iwata 2010;Habier et al. 2011;Ober et al. 2011;González-Camacho et al. 2012).
However, the majority of the aforementioned studies apply to a univariate single-environment (SE) prediction model in which phenotypic records or means of a training sample and a validation sample are obtained from the same set of environments by using the same year and location effects, with no consideration of genetic and residual correlation across environments. Recent studies have illustrated that the use of genetic and residual covariance across correlated environments may better the accuracy of across-environment GWP using a multienvironment (ME) model (Calus and Veerkamp 2011;Burgueño et al. 2012). However, it is still not clear what the actual gain in prediction accuracy with ME over SE in typical biparental populations in plant breeding would be and what is the key factor contributing to the gains. Therefore, objectives in this study are threefold: (1) Evaluate the accuracies of across-environment predictions with QP and GWP based on within-population (WP) and across-population (AP) experiments with SE and ME models by using data from a large NAM data set consisting of 25 bi-parental half-sib populations; (2) assess influences of different genetic and residual covariance structures on the prediction accuracy of GWP with different ME models; and (3) conduct an assessment the effects of variation in marker density on GWP with the SE and ME models. To make our study more practical and applicable to plant breeding, two cross-validation schemes were employed to achieve the above objectives by evaluating the accuracies of (CV1) predicting unobserved phenotypes of untested lines and (CV2) predicting unobserved phenotypes of lines that have been evaluated in some environments but not others.

NAM population
We retrieved phenotype and genotype data of 4131 recombinant inbred lines (RILs) from maize NAM populations derived from crosses between 25 genetically diverse inbreds and the maize elite parent B73 (Table 1) from the Panzea website (http://www.panzea. org). A total of 1106 single-nucleotide polymorphism (SNP) markers were used for genotyping each RIL, covering a genetic map of 1439 cM. The coding rules used for genotypic data were identical to those used in the within-environment prediction shown in Guo et al. (2012). To summarize, for individual populations, genotypes of each SNP of each progeny were coded 21 if both alleles were from a diverse parental line, 1 if from the common parent B73, and 0 otherwise. Based on coding rules, when several SNPs in complete linkage disequilibrium (LD) were available for a chromosomal region, only one SNP was selected. In each NAM population, selected SNPs varied from 785 to 895 whereas the genome coverage ranged from 1371 to 1397 cM. In the current study, we used phenotypic data of each RIL evaluated for leaf structure traits leaf length (LL) and leaf width (LW) from two locations, Aurora (AU), NY, and Champaign-Urbana (CU), IL, over 2 yr (2006 and 2007). A total of four environments were then defined as E1: AU at 2006; E2: CU at 2006; E3: AU at 2007; and E4: CU at 2007. Both LL and LW traits were found to have a genetic architecture affected by a large number of QTL of small effect (Tian et al. 2011). Based on phenotypic information from 2 yr and two locations, the broad sense heritability H 2 on a mean basis (Supporting Information, File S1) ranged from 0.60 to 0.84 for the LL trait and from 0.60 to 0.82 for the LW trait (Table S1). Over 25 NAM populations, the average of H 2 was 0.74 for both traits, reflecting a high precision of phenotyping efforts.

Models for estimation of marker effects
Two methods, CIM and RR-BLUP, were used to estimate marker effects based on genotypic and phenotypic data in a training data set in CV1 with balanced phenotypic records and CV2 with unbalanced phenotypic records which were discussed in details in next section. Prediction accuracies with CIM and RR-BLUP were used to measure estimates with QP and GWP strategies, respectively. Two models, SE and ME, were applied in both methods based on training sets obtained from WP and AP. In total, there were four models developed for each method: AP-ME, AP-SE, WP-ME, and WP-SE. In the discussion of each method, we start with the most complex model, AP-ME, and then simplify to obtain others.
The AP-ME model with CIM was extended from multi-trait QTL mapping (Jiang and Zeng 1995) to accommodate multipopulation analysis, which can be written as where Y = [y 1 , y 2 , . . ., y n ] T is an n · m matrix of phenotypic data for n lines from p NAM populations and m environments, and y 1 , y 2 ,. . .,y n are vectors composed of observations across m environments; X is an n · (1 + p) incidence matrix for n lines from p populations representing population structure with the first column ones; b is a (1 + p) · m matrix of individual population effects with the first column representing the overall mean; W is an n · f matrix of genotypes of cofactor markers; s is an f · m matrix of cofactor marker effects with f the number of cofactor markers; Q is an n · 1 matrix of QTL genotypes; a is an 1 · m matrix of additive effects of a putative QTL at a tested position; and e = [e 1 , e 2 , . . ., e n ] T is a matrix of residuals e i (i = 1, 2, . . ., n), which is assumed to be correlated between environments and to follow a multivariate normal distribution with means zero and covariance matrix In this model, the population effects b, cofactor effects s, and QTL additive effects a were treated as fixed effects and residual e as random effects. QTL genotype Q was not observed and was replaced with its expected value obtained from the probability distribution of QTL genotypes conditional on the closest flanking markers (Haley and Knott 1992). Cofactor markers were selected by a modified stepwise selection procedure (Buckler et al. 2009). Extensions of this model to others were straightforward. For WP-ME, model (1) with population structure excluded from X became a single-population ME analysis (Jiang and Zeng 1995). AP-SE was similar to model (1) but was reduced to a univariate regression model for SE analysis, which was further simplified into a WP-SE model without population structure (Zeng 1993(Zeng , 1994. To deal with unbalanced phenotypic records in a training data set, the modified CIM method developed by Guo and Nelson (2008) was applied to WP-ME and AP-ME in CV2 by iteratively imputing missing phenotypic data conditional on their posterior distributions.
On the basis of the aforementioned models, CIM was performed by scanning whole genomes with a fixed step size of 1 cM based on genotypic and phenotypic data in a training data set at each replicate of CV1 and CV2. A QTL was identified at the position in which the test statistic logarithm of odds (LODs) score assumed its maximum in the region under consideration with a LOD threshold (Utz et al. 2000). Cofactor selection in CIM requires a significance level that we estimated to be 0.0001 based on permutation tests for AP-SE and AP-ME (Buckler et al. 2009;Tian et al. 2011) and relaxed to 0.01 with WP-SE and WP-ME due to the decreased training sample size. Another key requirement was to determine LOD thresholds for identifying QTL. In the current study, LOD thresholds were estimated from 5000 permutation tests based on a genome-wide significance level 0.05 (Doerge and Churchill 1996). In WP-SE and AP-SE, phenotypic records for each RIL were randomized within each NAM population, whereas in WP-ME and AP-ME, permutation tests were performed by shuffling phenotypic records for multiple environments at once to preserve their correlation structure. Note that permutation tests were conducted for each model in CV1 and CV2 based on full data sets from each of the 25 NAM populations. Once QTL were identified using empirical LOD thresholds, effects of QTL were estimated by a multivariate multiple regression model with population structure.
The AP-ME model with RR-BLUP was an extension from the model proposed by Burgueño et al. (2012) as n þ 2 6 6 6 6 6 6 6 6 6 6 6 4 where y i (i = 1, 2,. . ., m) is a vector of phenotypic data for n lines from p NAM populations at the environment i; m is the total number of environments; X i is an n · (1 + p) incidence matrix for n lines and p populations representing the population structure with the first column ones; b i is a vector of individual population effects (including the overall mean); Z i is an n · k matrix of marker genotypes with k the total number of markers; a i is a vector of marker effects; and e i is a vector of residuals. The above model can be rewritten in reduced matrix form as . ., a m T ] T ; and e T = [e 1 T , e 2 T , . . ., e m T ] T . In this model, the population effects b were treated as fixed effects. The marker effects a were treated as random effects following a multivariate normal distribution Nð0; G 0 5I K Þwith G 0 the additive genetic covariance matrix between a and a9, I K a k · k identity matrix, and 5 the Kronecker product of matrices. G 0 can be defined as where s ii 2 is the genetic variance of lines in environment i (i = 1, 2, . . ., m), r ij (i = 1, 2, . . ., m, and j = 1, 2, . . ., m) is the genetic correlation of lines between environment i and j; and r ij = r ji . The genetic covariance matrix of Y can be defined as where A is an n · n genetic relationship matrix, which can be calculated using genome-wide markers as where M is an n · k matrix of marker genotypes and p i is the frequency of an allele at locus i (i = 1, 2, . . ., k) (VanRaden 2008). Note that the matrix M is equivalent to Z i (i = 1, 2, . . ., m) in the CV1 scheme. The residuals e were considered as random effects following a multivariate normal distribution N(0, R) where R is a residual covariance matrix calculated by where I N is an n · n identity matrix, and R 0 is an m · m matrix of the residuals in different environments, written as where e ii 2 is the residual variance in environment i (i = 1, 2, . . ., m), r ij (i = 1, 2, . . ., m; j = 1, 2, . . ., m) is the residual correlation of lines between environment i and j, and r ij = r ji . When model (2) was modified into AP-SE, it was reduced to a univariate model, and G 0 and R 0 became a special case of the multivariate model with diagonal entries denoting the genetic and residual variances, respectively. In WP-SE and WP-ME, the population structure X was reduced to the column vector of ones. These models were applied to estimate marker effects based on training data sets with balanced phenotypic records in CV1. For training sets with unbalanced phenotypic records in CV2, marker genotype matrix Z i in environment i may not be equal to Z j in environment j. The genetic covariance between line l and q across environment i and j in where Z il is the genome-wide marker genotype row vector for line l in environment i, Z jq is the genome-wide marker genotype row vector for line q in environment j, s ij 2 is the entry of G 0 at row i and column j, and p h is the frequency of an allele at locus h (h = 1, 2, . . ., k). The residual covariance was R ijlq = {e ij 2 } with e ij 2 the element at the ith row and the jth column of R 0 if line l in environment i and line q in environment j are from the same genotype, and zero otherwise, A key requirement in model (2) was to estimate the covariance matrix G 0 and R 0. In this study, G 0 and R 0 were estimated by the multivariate restricted maximum likelihood (REML) approach proposed by Vattikuti et al. (2012) based on model (2). Because estimates of variance components from multivariate REML were sensitive to initial values, estimates from univariate and bivariate REML were used as the initial values for the multivariate REML analysis. Although G 0 and R 0 were estimated in each NAM population for WP, they were estimated once by combining full data sets from all NAM populations for AP. Once estimation for G 0 and R 0 were done, environmentspecific marker effects a along with the population effects b were estimated by solving Henderson's mixed-model equations (Henderson 1984).

Cross-validation
We used validation methods to estimate the prediction accuracy of each model described previoulsy. Cross-validation is a statistical technique of splitting data into training and validation sets by using the validation set to evaluate prediction ability of the trained model. Two cross-validation schemes, CV1 and CV2 (Figure 1), were considered in the study based on different prediction purposes relevant to practical breeding problems (Calus and Veerkamp 2011;Burgueño et al. 2012). In CV1, the goal was to predict unobserved phenotypes of untested lines or newly generated lines based solely on genotypes; and in CV2, missing phenotypes of lines in an environment were predicted with genotypes and phenotypes from other environments. A total of 100 replicates of cross-validation were performed for each scheme, and the validation results were averaged over each replicate.
In each replicate of CV1 (Figure 1), RILs in each population were randomly split into a training data set comprising 60% of the samples and a validation data set of the remainder (Legarra et al. 2008;Lorenzana and Bernardo 2009;Guo et al. 2012). Phenotypic records of each line in the validation set were set to missing for all environments. In CV2 (Figure 1), a specific proportion (40%) of phenotypic records for each environment independently were set to missing at random. Therefore, phenotypic information was balanced for training sets generated in CV1, and unbalanced in CV2. Lines lacking phenotypic data for all environments were dropped in CV2. Theoretically, the proportion of these lines in each NAM population was (0.40) 4 = 2.56% across four environments. Excluding these lines had little effect on CV2 analysis.
In both CV1 and CV2 described above, training and validation sets were from a single population by the use of WP information. As an alternative practice, AP information can be used for prediction in the absence of WP information (Legarra et al. 2008;Zhao et al. 2012), mimicking a framework of using historical data to predict line performances of a cross. In CV1, the AP training set was generated by combining all RILs from the other 24 NAM populations. In CV2, the AP training set was generated with RILs of 25 NAM populations including the test population. In this case, each line in the training set contained unbalanced phenotypic records. It is worthy of mentioning that both WP and AP were used to predict performances of lines from the identical validation set in CV1 and CV2, respectively.
Although training was performed with WP and AP approaches in CV1 and CV2, validation was conducted within each NAM population for each environment. In each replicate, prediction accuracies were measured as the correlation coefficient between estimated BVs and observed phenotypes of lines in the validation set usinĝ where ŷ i is the estimated estimated BV of individual i in the validation sample;m andâ j are environment-specific overall mean and marker effects estimated from a training sample using the methods discussed in the last section; and z ij is the genotype of marker j for line i in the validation set. Note that only QTL were used in the model with the QP strategy, while all the markers were included for the GWP strategy. The final reported prediction accuracy was in fact the mean of the 100 predictions generated across the replicate runs. Overall accuracies between various models were compared with a pairwise t-test (a = 0.05) based on identical validation sets within each NAM population. Gains in prediction accuracy with one model (e.g., model A) over another one (e.g., model B) were calculated using (R A -R B ) / R B , where R A represents prediction accuracy with model A and R B prediction accuracy with model B.
Impacts of genetic and residual covariance structure on GWP Impacts of genetic and residual covariance on GWP with ME were assessed by comparing four different models: (1) structured genetic and residual covariance (SG-SR); (2) structured genetic and unstructured residual variance (SG-UR); (3) unstructured genetic and structured residual variance (UG-SR); and (4) unstructured genetic and residual covariance (UG-UR). SG-SR was the simplest model among them, in which each of the nondiagonal entries in G 0 and R 0 estimated above were set to zero, imposing independence on genetic and residual correlations. This model was equivalent to fitting each SE model separately with GWP. SG-UR imposed independence on genetic correlation by setting non-diagonal entries to zeros in G 0 , while UG-SR removed residual correlation by setting non-diagonal entries to zeros in R 0 . UG-UR was completely unstructured and estimated in the previous section. Influences of residual covariance were evaluated by comparing SG-SR with SG-UR, whereas impacts of genetic covariance were assessed by comparing UG-SR with SG-SR. Finally, effects of residual covariance given the genetic covariance were evaluated by comparing UG-UR with UG-SR.
Effect of marker density on prediction accuracy of GWP Effects of marker densities on GWP were tested by setting a conditional genetic distance criterion (c) between two flanking markers to ensure that each chromosome was evenly covered by a set of SNPs. For each NAM population, varying numbers of markers were obtained by selecting values of c (c = 1.6, 5, 10, 15, 20, 25, and 30 cM) without compromising the overall genomic coverage using the method described by Guo et al. (2012). This was done to observe the influence of using smaller marker sets than all available data from the original NAM study. All the identified polymorphic markers were included in the model at c = 1.6 cM, meaning that the mean marker distance of the entire marker set was~1.6 cM. Note that a training sample proportion 0.6 was always used when assessing the effect of different marker densities to ensure different densities were applied to the same training and validation sets at each replicate of 100 cross-validations. Table 2. Prediction accuracy in each cell in this table was the average of prediction accuracies over 25 NAM populations (Table S7, Table S8,  Table S9, Table S10, Table S11, Table S12, Table S13, Table S14, Table  S15, Table S16, Table S17, Table S18, Table S19, Table S20, Table S21, and Table S22). As discussed previously, QTL for QP were identified using the empirical LOD thresholds estimated by permutation tests (Table S2). With GWP, the genetic and residual covariances obtained from multivariate REML for traits LL and LW (Table S3, Table S4,  Table S5, and Table S6) were used to estimate environment-specific   a In parentheses is the number of QTL identified by QP based on the SE model.

Accuracies of predictions for traits LL and LW are shown in
b In parentheses is the gain in prediction accuracy with GWP over QP based on the SE model. c The first value in parentheses is the number of QTL identified by QP based on the ME model; and the second one the gain with ME over SE for QP. d The first value in parentheses is the gain in accuracy with GWP over QP based on the ME model; and the second one is the gain in accuracy with ME over SE using GWP.
marker effects with RR-BLUP. We first compared GWP with QP in CV1 and CV2. Overall, GWP gave consistently greater accuracies than QP. Increase in accuracy with GWP over QP was statistically significant in 99% of predictions generated (Table S7, Table S8, Table S9,  Table S10, Table S11, Table S12, Table S13, Table S14, Table S15,  Table S16, Table S17, Table S18, Table S19, Table S20, Table S21, and Table S22). Gains in accuracies with GWP over QP were larger for WP than for AP were attributed to two factors: (1) GWP with WP gave greater accuracies than that with AP; and (2) QP with WP gave lower accuracies than that with AP due to a decreased number of QTL identified in WP. Although AP gave improved accuracies for QP over WP, it still yielded lower predictive power than GWP, which was also lower than the accuracies with GWP in WP. Afterward we compared prediction accuracies between the SE and ME models for GWP. The ME model provided greater prediction accuracies than the SM model in both CV1 and CV2. Increases in accuracies were significant in most cases (Table S7, Table S8, Table S9,  Table S10, Table S11, Table S12, Table S13, Table S14, Table S15,  Table S16, Table S17, Table S18, Table S19, Table S20, Table S21,  and Table S22). For WP in CV1, across all populations and environments, average gains in accuracies with ME over SE were 8% and 12% for traits LL and LW, respectively. Gains attained a maximum of 48% in some populations in CV1 (Table S7). For WP in CV2, average increases with ME over SE reached 40% and 33% for LL and LW traits, respectively. For AP, gains with ME over SE were reduced to 3% in CV1 and 10-13% in CV2 for LL and LW traits. On average, across all populations, environments, and traits, gains with ME over SE were 10% and 37% for WP, and 3% and 11.5% for AP in CV1 and CV2, respectively. The aforementioned gains were observed based on the average genetic correlation of 0.77 and 0.87 and the average residual correlation of 0.31 and 0.39 for WP and AP, respectively (Table S3,   Table S4, Table S5, and Table S6). With different training and validation sets, it may not be appropriate to evaluate performances of GWP between CV1 and CV2 on the same ground in this study. However, given similar accuracies with the SE model obtained from CV1 and CV2, comparison between them suggested, but did not prove, that gains with ME over SE were greater in CV2 than that in CV1 at WP and AP.
Prediction accuracies for GWP using ME with different genetic and residual variance structures are summarized in Table 3. Similar to  Table 2, each of the accuracies shown in this table was the average of prediction accuracies over 25 NAM populations. Detailed results can be found in Table S23, Table S24, Table S25, Table S26, Table S27,  Table S28, Table S29, Table S30, Table S31, Table S32, Table S33,  Table S34, Table S35, Table S36, Table S37, and Table S38. In CV1 and CV2, SG-UR gave prediction accuracies comparable to or less than that attained by SG-SR, suggesting that addition of residual covariance did not benefit predictions, and may have worsened them. In contrast, prediction accuracies with UG-SR were greater than that with SG-SR, suggesting that modeling genetic covariance improved predictions with the ME model. Finally, UG-UR gave comparable prediction accuracies to UG-SR, suggesting that adding residual covariance had little effect on predictions in the presence of the genetic covariance.
Marker densities of 1. 6,5,10,15,20,25, and 30 cM were used corresponding to different numbers of markers in each training sample via methods described in the previous section. As expected, prediction accuracies for GWP with SE and ME models improved with increasing marker density in the example using the NAM population B73·CML322 (Figure 2). These improvements tended to diminish to zero when marker density exceeded a threshold of 10 cM, suggesting that the density selected based on an interval size of 10 cM n was sufficient to capture LD between QTL and markers in these biparental segregating populations, similar to reports by Guo et al. (2012).

DISCUSSION
GWP provides a significant predictive advantage over QP in WP and AP, with increased gains in WP compared with AP. For almost all cases in WP, GWP shows significantly greater accuracy than QP in each of the NAM populations. This conclusion is consistent with the results from previous studies (Lorenzana and Bernardo 2009;Guo et al. 2012;Zhao et al. 2012). More interestingly, when training sets are changed from WP to AP, estimations with GWP are reduced, similar to previous reports in mice (Legarra et al. 2008) and maize (Zhao et al. 2012). The loss of accuracy from WP to AP for GWP may attribute to the differences of genetic backgrounds between training and validation populations. The differences may also affect QP and could reduce prediction accuracy from WP to AP for QP. However, in this study, we found a gain in accuracy with AP over WP. This could be explained by the different genetic basis used for prediction between GWP and QP. Although GWP relies on genetic relationship derived from genome-wide markers, QP uses a small subset of markers and focuses more on LD between these markers and QTL. As a result, QP may be less affected by different genetic backgrounds between training and validation sets than GWP. The potential loss caused by the different genetic backgrounds in QP could be compensated by the gain obtained from AP by improved QTL mapping power by utilizing multiple populations (Yi and Xu 2002;Blanc et al. 2006;Buckler et al. 2009;Tian et al. 2011). However, even with increased prediction accuracy with AP, QP still cannot achieve the high prediction accuracies attained by GWP with WP. This finding suggests, in practical breeding, to achieve high prediction accuracies, even with a high cost, it is still worthwhile to phenotype a proportion of lines from a breeding population to predict performances of genotypes generated from the subsequent intercross or backcross generations (Bernardo and Yu 2007). The use of across-environment information improves prediction with GWP. Two main results were obtained from this study. First, modeling covariances between correlated environments with the ME model gives better predictions compared with SE in both the CV1 and CV2 schemes. At an average genetic correlation of 0.77, gains of 10% in CV1 and 37% in CV2 were observed for WP, greater than that of 3% in CV1 and 11.5% in CV2 for AP at a correlation of 0.87. Although the superiority of ME over SE in CV2 can be explained by borrowing information from the same lines across environments (Burgueño et al. 2012), gains in CV1 may likely attribute to more accurate estimates of environment-specific marker effects by utilizing genetic correlation. Gains in CV1 and CV2 also were reported based on a simulation study in animal breeding with the multitrait model, similar to the ME model tested in this study. This gave increases in accuracies of 0.03 to 0.14 over the single-trait prediction model that is similar to the SE model at genetic correlation levels of 0.25 and 0.75 (Calus and Veerkamp 2011). Furthermore, we found that the gain in CV2 with ME over SE is greater than that in CV1. This conclusion is also consistent with previously reported results (Calus and Veerkamp 2011;Burgueño et al. 2012). Second, by modeling and comparing different genetic and residual covariance structures, we found that gains with ME over SE are attributed to genetic covariance in CV1 and CV2, with little or negative contribution from the residual covariance. This finding indicates that accurately estimating the genetic covariance structure between correlated environments is critical to improve prediction accuracy with ME models. Although these results were obtained with relatively simple covariance structures from only four environments, more studies need to be conducted to confirm if these gains reflect a general superiority of the ME model with a large number of environments.
Impacts of marker density also were evaluated for GWP with the SE and ME models. Overall, our conclusions were consistent with previous work from within-environment predictions (Lorenzana and Bernardo 2009;Heffner et al. 2011;Guo et al. 2012;Zhao et al. 2012). Results indicate that a marker density of 10 cM, approximately corresponding to 150 markers, should be used when designing a predictive breeding strategy with GWP in a biparental breeding population. Although this result is consistent with findings in the previous studies in plant breeding (Lorenzana and Bernardo 2009;Guo et al. 2012), it has to be kept in mind that conclusions were drawn based on the typical biparental segregating populations with extensive LD caused by limited recombination. In the context of animal and human GWP where the populations studied are generally composed of complicated pedigreed individuals, a greater marker density is required to capture the small LD caused by accumulated historical recombinations. It was reported that at least 80,000 SNPs in human (Makowsky et al. 2011) and 5000 to 7500 SNPs in animals (Vazquez et al. 2010) are needed to reach a high prediction accuracy with GWP. Therefore, marker densities may be determined mainly by LD structure in different types of breeding populations. Also the goals and resources of a breeding project may affect decisions.
In practical plant breeding, GWP needs to be integrated into the appropriate genome-wide selection schemes where several cycles of intercrossing may be required to increase genetic gains (Bernardo and Yu 2007; Jannink et al. 2010). The CV1 scheme discussed in this study will mainly serve this purpose, and BVs of newly generated genotypes will be predicted and used for selection. Though there is no need for QTL identification in GWP, this does not suggest that the QTL ideotype construction strategy deployed in marker-assisted recurrent selection scheme cannot be implemented via GWP (Nakaya and Isobe 2012). At its simplest, the construction of QTL ideotypes may be accounted for by BVs obtained from GWP. For example, these BVs can be used in the QTL introgression process to efficiently pyramid 3 to 30 QTL as demonstrated by Bernardo (2009). In contrast, the CV2 scheme investigated in the current paper may be used to predict missing phenotypes caused by random missingness for a same set of genotypes evaluated across multiple environments. Overall, when the focus of a complex trait study is migrated from traditional line performances to allelic effect evaluation (Heffner et al. 2011), prediction of BVs with GWP may be further improved in CV1 and CV2 schemes. To maximize genetic gains in varied breeding programs per unit time and cost, it will be critical to design different genome-wide selection strategies to fully take advantage of the high accuracy of BVs at key stages of marker-assisted breeding.