Prediction of Means and Variances of Crosses With Genome-Wide Marker Effects in Barley

Background: The expected genetic variance is an important criterion for the selection of crossing partners which will produce superior combinations of genotypes in their progeny. The advent of molecular markers has opened up new vistas for obtaining precise predictors for the genetic variance of a cross, but fast prediction methods that allow plant breeders to select crossing partners based on already available data from their breeding programs without complicated calculations or simulation of breeding populations are still lacking. The main objective of the present study was to demonstrate the practical applicability of an analytical approach for the selection of superior cross combinations with experimental data from a barley breeding program. We used genome-wide marker effects to predict the yield means and genetic variances of 14 DH families resulting from crosses of four donor lines with five registered elite varieties with the genotypic information of the parental lines. For the validation of the predicted parameters, the analytical approach was extended by the masking variance as a major component of phenotypic variance. The predicted parameters were used to fit normal distribution curves of the phenotypic values and to conduct an Anderson-Darling goodness-of-fit test for the observed phenotypic data of the 14 DH families from the field trial. Results: There was no evidence that the observed phenotypic values deviated from the predicted phenotypic normal distributions in 13 out of 14 crosses. The correlations between the observed and the predicted means and the observed and predicted variances were r = 0.95 and r = 0.34, respectively. After removing two crosses with downward outliers in the phenotypic data, the correlation between the observed and predicted variances increased to r = 0.76. A ranking of the 14 crosses based on the sum of predicted mean and genetic variance identified the 50% best crosses from the field trial correctly. Conclusions: We conclude that the prediction accuracy of the presented approach is sufficiently high to identify superior crosses even with limited phenotypic data. We therefore expect that the analytical approach based on genome-wide marker effects is applicable in a wide range of breeding programs.


INTRODUCTION
Selection gain in breeding programs relies on the selection of suitable crossing partners which will result in derived lines with superior performance. The best cross is not necessarily the cross with the greatest mean performance, but the cross of which the best lines show the highest performance (Zhong and Jannink, 2007). Looking at the criteria which have been suggested to evaluate the potential of a certain cross to generate high-performing progeny, such as the usefulness criterion U = µ + iσ g h (Schnell and Utz, 1975) or the superior progeny value s = µ + iσ g (Zhong and Jannink, 2007), it becomes clear that the expected genetic variance within a cross is the key factor for identifying the best crosses. Nevertheless, strategies for identifying superior crosses in applied breeding programs have so far mostly relied on pedigree information, mid-parent performance and phenotypic evaluation (Lado et al., 2017). The main reason why the selection of crosses on the basis of their progeny variance has not yet been widely implemented in plant breeding programs was that before the advent of molecular markers there were only limited possibilities of obtaining sufficiently precise predictors for these genetic variances.
In the era of high-throughput genotyping and genomic selection, recent research has focused on obtaining predictors for the genetic variance from genome-wide marker estimates by either simulations (Bernardo, 2014;Lian et al., 2015;Mohammadi et al., 2015) or analytical approaches (Zhong and Jannink, 2007;Bonk et al., 2016;Lehermeier et al., 2017). Versatile analytical methods that allow plant breeders to make a fast selection of superior crossing partners based on already available genotypic and phenotypic data from their breeding programs without the need of reparametrization of estimated marker effects, complicated calculations, or simulation of breeding populations promise to improve the efficiency of breeding programs. In a previous study, we have presented an analytical approach for the prediction of the means and genetic variances of crosses based on maker effects estimated by methods of genomic selection that works for arbitrary mapping functions and mating systems (Osthushenrich et al., 2017). First promising results of cross prediction with analytical approaches were published for simulated populations or multiparental mapping populations (Bonk et al., 2016;Lado et al., 2017;Lehermeier et al., 2017;Osthushenrich et al., 2017). However, as the design of mapping populations deviates from the design of typical breeding populations, the practical applicability of the analytical approaches in plant breeding populations remains to be demonstrated. To our knowledge, no studies are available which investigate the application of analytical approaches for cross prediction for agronomically important complex quantitative traits with data from actual breeding populations.
The aims of the present study were to apply the analytical formulas for prediction of the means and variances of crosses by Osthushenrich et al. (2017) to a data set from a resistance breeding project in barley, and to investigate the model fit for Abbreviations: DH, doubled haploid. yield in 14 families of doubled haploid (DH) lines derived from crosses of four pre-breeding lines and five registered commercial elite varieties. Our objective was to investigate the practical relevance and applicability of our analytical approach for the identification of superior cross combinations in plant breeding programs.

Field Data
An augmented design with five blocks was used to evaluate all genotypes for yield in one year at five locations in Germany with one replication per location. The field experiment was carried out in Adenstedt (State Niedersachsen, Region Südhannover), Harzhof (State Schleswig-Holstein, Region Ost-Holstein), Irlbach (State Bayern, Region Niederbayern), Lenglern (State Niedersachsen, Region Südniedersachsen), Morgenrot (State Sachsen Anhalt, Region Östliches Harzvorland). The parental lines were used as checks and were replicated five times.   The field data were analyzed with the mixed linear model Yield ∼ µ + Genotypes + Location + Location : Blocks + Error where the common mean µ and genotypes were treated as fixed factors, whereas blocks, locations, and heteroscedastic model errors were assumed as random. The resulting adjusted entry means for yield for each DH line were used in further calculations.

Genotypic Data
All 250 resulting DH lines and the ten parental lines were genotyped with the 50 k iSelect Chip (Trait Genetics, Gatersleben). All SNP markers with more than two recorded alleles, more than 10% missing values and a gene diversity of <10% were excluded from the analysis, as well as all individuals with more than 15% missing marker information. As a result, 9,597 SNP markers and 259 genotypes (249 DH lines and 10 parental lines) remained for the analysis.

Genomic Prediction of Marker Effects
For the prediction of marker effects, we used ridge-regression best linear unbiased prediction (Meuwissen et al., 2001). As training set for the prediction of marker effects we used the complete genotypic and the phenotypic data of the 249 DH lines from the 5 × 5 factorial which remained after data cleaning.

Prediction of Cross Parametersμ g andσ 2 g
For the prediction of the expectationμ g and the genetic varianceσ 2 g of the crosses we used the analytical approach of Osthushenrich et al. (2017) and the marker effects estimated with RR-BLUP. The required recombination frequencies were derived from a published linkage map (Bayer et al., 2017). We used the genotypes of the ten parental lines to predictμ g andσ 2 g of the resulting DH lines of the validation set.

Validation Set
For validating the prediction ofμ g andσ 2 g , we compared the predictions from the formulas with the observed phenotypic values x and s 2 p from the field trial. As validation set, we used the 200 DH lines resulting from the following 14 crosses: 146ETI, 146JEN, 146MER, 146OTT, 146QUA, ANTETI, ANTMER, ANTOTT, ANTQUA, D33ETI, D33MER, D33QUA, D37OTT, D37QUA. The remaining crosses did not result in viable offspring. For line 101, the resulting DH lines from all five crosses had to be excluded from the validation set, as the genotype of the parental line 101 did not match the genotype of the resulting DH lines, meaning that a problem with seed identification of the parental line had at some point occurred during the project. The final validation set thus comprised an unbalanced 5 × 4 factorial of 14 families of 200 DH lines in total ( Table 1).
Comparison of Predictedμ g andσ 2 g and Observed Parameters x and s 2 p For comparing the predicted and the observed values from the field trial, we used the yield data of the validation set ( Table 1). As the variance of the phenotypic data is defined as σ 2 p = σ 2 g + σ 2 m , the approach of Osthushenrich et al. (2017) was extended by an estimate of the distribution of the phenotypic data by adding an estimate s 2 m of the masking variance σ 2 m to the predicted variancê σ 2 g . For this purpose, the masking variance s 2 m was estimated as the square of the average standard error of the adjusted treatment mean of the mixed models analysis of the field trial (Piepho and Möhring, 2007).
Due to the balanced design of the field trial, the estimated masking variance s 2 m resulted in the same value of 33.41 dt²/ha² for all 14 crosses. An Anderson Darling goodness-of-fit test (Anderson and Darling, 1954) was carried out to test the null hypothesis that the observed yield values of the 14 DH families are a sample from a normal distribution N μ g ,σ 2 g + s 2 m .

Ranking of Crosses
To validate the identification of superior cross combinations with the analytical approach of Osthushenrich et al. (2017), we created a ranking of crosses based on the criterionμ g +σ g . This predicted ranking of the crosses was compared to the ranking of crosses based on the best-performing DH line from each cross.

Software
The statistical analysis of the field data was conducted in R version 3.4.2 (R Core Team, 2017). The estimation of marker effects as well as the prediction of the means and genetic variances of the crosses was conducted in R version 3.4.2 with the software package SelectionTools, which is freely available for download FIGURE 2 | Marker-based predictions of the genetic meansμ g and variancesσ 2 g of the DH lines derived from all crosses of the complete factorial. The density of a normal distribution with the predicted genetic parameters N μ g ,σ 2 g is depicted in blue. The density of a normal distribution with the predicted phenotypic (Continued) Frontiers in Plant Science | www.frontiersin.org FIGURE 2 | parameters N μ g ,σ 2 g + s 2 m is depicted in green, where s 2 m is the estimated masking variance obtained as the square of the standard error of the adjusted phenotypic means of the field trial. For the crosses for which field data are available, the adjusted treatment means are marked with green dots and the respective family means x with red triangles. p is the p-value of the Anderson-Darling goodness-of-fit test for the null hypothesis that the observed adjusted treatment means are a sample of a normal distribution N μ g ,σ 2 g + s 2 m .
from our homepage 1 . A code and output example is available in Figure 5.

RESULTS
The observed mean yield performanceμ g of the crosses ranged from 82.85 dt/ha (146ETI) to 97.31 dt/ha (ANTQUA) (Figure 2). The genetic variancesσ 2 g ranged from 0.96 dt²/ha² (ANTOTT) to 15.20 dt²/ha² (D37QUA). The differences between the predicted yield meansμ g and the genetic variancesσ 2 g were larger between crosses of the same elite variety with different donor lines (columns of Figure 2) than between crosses of the same donor line with different elite varieties (rows of Figure 2). For example, the crosses of the elite variety QUA with four donor lines showed a comparatively large variation ofμ g and ranged between 85.93 dt/ha and 97.31 dt/ha (last row of Figure 2). The genetic variancê σ 2 g also showed a comparatively large variation and ranged between 1.05 dt²/ha² and 15.20 dt²/ha². In contrast, for the five crosses with donor line 146,μ g for yield ranged only between 82.85 dt/ha and 85.93 dt/ha, andσ 2 g ranged only between 8.73 dt²/ha² and 12.12 dt²/ha² (first column of Figure 2). Crosses with donor line ANT, which is a highly resistant elite variety, displayed the overall highest values ofμ g and the lowest values ofσ 2 g (second column of Figure 2).
The crosses D33ETI and D33QUA showed downward outliers which resulted in high observed phenotypic variances s 2 p of 36.57 dt²/ha² and 80.64 dt²/ha² (data not shown, but outliers visible in Figure 2). The phenotypic variances of the other twelve crosses with viable offspring ranged between 9.38 and 36.46 dt²/ha² (data not shown). The estimate of the masking variance based on the average standard error from the field data was s 2 m = 33.41 dt²/ha² and thus was higher than the observed phenotypic variances for ten out of 14 crosses (data not shown).
The Anderson-Darling goodness-of-fit test indicated that there is no evidence to reject the null hypothesis that the observed yield values (Figure 2, green dots) are sampled from a normal distribution N μ g ,σ 2 g + s 2 m (green curves) in 13 out of 14 crosses. The exception was cross D33ETI which featured downward outliers and a left-skewed sample distribution (p = 0.01).
The correlation between the observed yield means x (Figure 2, red triangles) and the predicted yield meansμ g was r = 0.95 (data not shown). The correlation between the observed phenotypic variance s 2 p and the predicted genetic varianceσ 2 g was r = 0.34 for all 14 crosses (data not shown). However, when the two crosses D33ETI and D33QUA with downward outliers were removed, this correlation increased to r = 0.76 (data not shown). A comparison of the ranking of crosses based on the observed yield data of the best resulting DH line from each cross with the ranking of the crosses based on the criterionμ g +σ g which relied on the predicted parameters showed that the prediction accuracy was sufficient to correctly identify the 50% best crosses (Figure 3).
A negative covariance existed betweenμ g andσ 2 g for all crosses (Figure 4). However, when the five potential crossing partners were regarded separately for each donor line, the covariances betweenμ g andσ 2 g were positive.

DISCUSSION
Despite the recent large interest in methods of cross prediction and the selection of promising crossing partners based on marker data in the plant breeding community (Bernardo, 2014;Lian et al., 2015;Mohammadi et al., 2015;Bonk et al., 2016;Han et al., 2017;Lado et al., 2017;Lehermeier et al., 2017), the application of the published analytical approaches was either demonstrated with simulated data sets or in mapping populations which are not comparable in their structure to typical breeding populations. No studies are available to our knowledge in which the applicability of analytical approaches for marker data was tested for relevant traits such as yield in plant breeding data sets. In the present study, we tested if the formulas for variance prediction presented in Osthushenrich et al. (2017) show sufficient precision for the identification of the most promising crossing partners in an ongoing resistance breeding project in barley. The data set in use in this investigation was not specifically designed for a rigorous validation of the formulas of Osthushenrich et al. (2017). For such a validation study, several parameters would need a different experimental design. We outline these parameters here to show the limits of the present evaluation.
The present study uses a set of intercrossed lines as a training set, and we evaluate the genetic variance in the same data set. Consequently, the results presented here cannot be regarded as an independent validation. Instead, we are rather investigating the fit of the model to the data. If the model does not fit the data in such an analysis, the conclusion can be drawn that the model is not suitable to explain the data. If the model is able to explain the data, however, a considerable overfitting of the model might still be present, because genomic prediction is an p > n problem where the number of independent variates (p, markers) is greater than the number of observations (n, lines). The observed ranking is based on the yields of the best DH lines resulting from each cross in the field trial (left). The predicted ranking of crosses is based on the criterionμ g +σ g from the predicted cross parameters (right). The 50% top-ranked crosses are depicted in green. The 50% bottom-ranked crosses are depicted in red. The green and red lines indicate how the position of the crosses has changed between the observed and the predicted ranking.
This potential overfitting was not quantified by the analyses we present here.
We are using only small numbers of lines per cross. The estimates of the phenotypic variances within each cross are therefore not estimated with a high precision, but instead they have large standard errors and large confidence intervals. In an experiment designed to validate the formulas for variance prediction, larger family sizes would be desirable.
Due to their large standard errors, we decided not to further decompose the per-cross variances into genetic variance and within-cross residual variance. Such an analysis would have the advantage of being able to compare genetic within-cross variances, and in addition would be able to model cross-specific residuals. Nevertheless, the estimation errors of genetic variances are large, even for experiments that were designed specifically for that purpose, and in the present data set we consider the precision of per-cross estimates of genetic variances as too low for drawing valid conclusions. For this reason, we decided to present only the phenotypic per-cross variances, and to compare those with the masking variance estimated across all crosses. This enables an explorative comparison of the magnitudes of the variance components. In a purposely designed experiment, the estimation of per-cross genetic variances and their comparison with the predicted genetic variance would provide not only an explorative comparison but rather would allow more stringent hypothesis testing.
The field trial in our experiment consisted of five replications for each genotype, this resulted in a limited precision of the phenotypic data. As a consequence, the masking variance in our experiment still amounts to considerable size. In a validation experiment carrying out replicated trials in more than five locations and more than one year would result in a smaller masking variance. Ideally, the design of the validation experiment should result in a masking variance that is smaller than the within-family variance. This would allow an effective within-family selection. Further, it would be desirable if the validation experiment was of a size that allowed heteroscedastic error variances for locations or even for the location:cross combinations.

FIGURE 5 | Demonstration of R Code used for cross prediction with package SelectionTools.
A further issue that is not addressed with our experimental setup is the question of whether random genetic drift or selection during the DH process might have an effect on the estimated variances, this might also be addressed in a validation experiment.
Our motivation to use the present data set in spite of its limitations and in spite of the fact that it was not specifically designed for validation of formulas for variance prediction was, that it actually originates from a practical breeding program. Our argumentation is that the results presented here have a high transferability to applied breeding programs, whereas the results of a pure validation study would have only a limited transferability due to differences in the experimental setup.
The prediction of the yield meansμ g and genetic varianceŝ σ 2 g of the 14 crosses of five registered elite varieties and four resistance donors for which phenotypic data was available yielded overall plausible results (Figure 2). For example, for the crosses of the elite variety QUA with four donor lines (last row of Figure 2),μ g for yield ranged between 85.93 dt/ha and 97.31 dt/ha andσ 2 g ranged between 1.05 dt²/ha² and 15.20 dt²/ha². For the five crosses with donor line 146 (first column of Figure 2),μ g for yield ranged only between 82.85 dt/ha and 85.93 dt/ha and σ 2 g ranged between 8.73 dt²/ha² and 12.12 dt²/ha². Differences between the crosses inμ g andσ 2 g were thus more influenced by donor lines (columns of Figure 2) than by the elite varieties (rows of Figure 2), indicating that the elite varieties contributed little to the genetic varianceσ 2 g of the crosses and had similar mean performanceμ g . This is also illustrated by the fact that all crosses of elite varieties with donor line ANT, which is also a highly resistant elite variety, had a comparatively highμ g and a lowσ 2 g compared to the other crosses. These findings are reflected in the varying spread of the blue normal distribution curves in Figure 2 with N μ g ,σ 2 g for the different crosses. It is also confirmed by the corresponding values for the observed yield means x (red triangles) and the observed variances s 2 p from the field trial (data not shown).
While a direct comparison ofμ g and x from the field trial is straightforward and yielded a correlation of r = 0.95 (data not shown), a direct comparison ofσ 2 g predicted from genetic marker effects with the estimated phenotypic variance s 2 p from the field trials is problematic and less straightforward.
The data set used in the present study comprises field data from only one year, a very limited number of locations and only one replication. In such a small data set, large standard errors are expected for the estimation of the phenotypic variance s 2 p , which result in large confidence intervals. A confidence interval for an observed variance s 2 of a normal distribution is defined as Bronshtein et al. (2003): For example, s 2 p of the 13 yield values in the field trial for cross 146ETI was 32.63 dt²/ha², resulting in a large 0.95 confidence interval of [16.78; 88.91]. From this we can deduce that the point estimator of the phenotypic variance has only limited accuracy. Moreover, marker-based predictions ofσ 2 g are predictions of the genetic variance within a cross, whereas the variance of the true observed values in a field trial is σ 2 p = σ 2 g + σ 2 m , where σ 2 g is the genetic variance and σ 2 m is the masking variance due to environmental effects and inaccuracies of the field trial (Piepho and Möhring, 2007). In the present study, the s 2 m estimated from the field trial was 33.41 dt²/ha², while the predicted genetic variancesσ 2 g ranged between 0.96 dt²/ha² for the cross ANTOTT to 15.20 dt²/ha² for the cross D37QUA. Thus, s 2 m was in all crosses about 2-30 times higher thanσ 2 g , and was consequently the major component of the phenotypic varianceσ 2 p . To account for σ 2 m in our comparison of predicted and observed variancesσ 2 g and s 2 p , we fitted the green normal distribution curve N μ g ,σ 2 g + s 2 m . We conducted an Anderson-Darling goodness-of-fit test to test the hypothesis that the phenotypic yield values of the DH lines from the field trial are samples from these normal distributions (Anderson and Darling, 1954). There was no evidence that this null hypothesis could be rejected for 13 out of 14 crosses (Figure 2). However, when looking at the absolute values of the observed phenotypic variances s 2 p (data not shown) and the predicted phenotypic variancesσ 2 p , our prediction ofσ 2 p =σ 2 g + s 2 m tended to overestimate the observed variance s 2 p of the phenotypic values. This overestimation could be expected, as precise field trials to assess the yield are only carried out for a limited number of pre-selected individuals, while the analytical approach yields estimates for infinite unselected population sizes. Moreover, the crosses D33ETI and D33QUA featured downward outliers that might have inflated the average standard error for the adjusted treatment means and consequently the derived masking variance s 2 m . Under the assumption that the masking variance σ 2 m is constant for all crosses, the correlation r between the predicted genetic varianceσ 2 g and the observed phenotypic variance s 2 p gives an idea how valid the predictions for the evaluation of suitable crossing partners are. This correlation was r = 0.34 for all 14 crosses (data not shown). However, this was also mainly due to the crosses D33ETI and D33QUA, which each displayed outliers in the form of two very low yield values (Figure 2), resulting in high observed variances s 2 p of the phenotypic values. Excluding these two crosses, the correlation increased to r = 0.76 (data not shown). From these findings, we draw two conclusions. First, low correlations between the predicted genetic variancesσ 2 g and the observed phenotypic variances s 2 p can be caused by outliers in the field trial which result in overestimated phenotypic variances. They do not necessarily mean that the prediction approach in itself is faulty or inaccurate. Rather, accurate field trials are of major importance not only for estimating marker effects and cross prediction, but also for the plausible validation of cross prediction. The evaluation of the accuracy of cross prediction should therefore comprise a careful monitoring of the field data. Estimates of the phenotypic variance s 2 p from samples with outliers should be treated with caution. Second, the results shown in Figure 2 indicate that our predictions ofσ 2 g overall yielded reasonable results in light of the limitations of the available phenotypic data.
Despite the fact that the predicted genetic variancesσ 2 g are difficult to validate with phenotypic data from breeding programs, they can still improve the efficiency of breeding programs with respect to long-term response to selection and efficient use of the limited plot number for field trials. Even for lower correlations betweenσ 2 g and s 2 p it is reasonable to focus on crosses with high predicted genetic variance in order to maintain genetic diversity and long-term response to selection, given that reliable phenotypic and genotypic data is available for predicting marker effects.
More importantly, we argue that the main application of cross prediction in practical breeding programs is not so much to provide 100 percent accurate predictions ofμ g andσ 2 g but to allow the breeder to identify a certain fraction of promising crosses from the complete list of potential crosses in order to use the limited number of field plots efficiently. We compared the ranking of the crosses based on the criterionμ g +σ g to the ranking of the crosses based on the yield data of the best resulting DH line from each cross (Figure 3). In this comparison, all seven top-ranked crosses were identified correctly with the predicted parameters, allowing the breeder to efficiently narrow down the number of lines which have to be evaluated in costly field trials by 50% without reduction in selection gain.
It has been postulated that a negative covariance exists betweenμ g andσ 2 g (Zhong and Jannink, 2007). This suggestion is very reasonable, as elite varieties which are fixed at many loci for superior alleles will result in crosses with highμ g and loŵ σ 2 g . This negative covariance is also observed in our data set ifμ g is plotted againstσ 2 g (Figure 4). For example, the ANT crosses can be considered as crosses between two elite varieties and consequently have a comparatively highμ g and lowσ 2 g compared to the other crosses. In our data set, in line with the suggestions of Zhong and Jannink (2007), genetic varianceŝ σ 2 g were more influenced by donor lines (columns of Figure 2) than by the elite varieties (rows of Figure 2), indicating that the elite varieties contributed little to the genetic variancesσ 2 g of the crosses. Crosses of elite varieties with donor lines 146, D33 and D37 which are pre-breeding lines with overall lower agronomic performance have lowerμ g and higherσ 2 g in comparison to the ANT crosses.
Thus, we observed that the negative covariance betweenμ g andσ 2 g of the crosses is mainly due to the different level of breeding intensity and selection that the donor lines have been subjected to (Figure 4). If the crosses of donor lines are regarded separately, as indicated by the different colors in Figure 4, a positive covariance existed betweenμ g andσ 2 g . We therefore conclude that for many scenarios, for example if a specific donor line carrying desired resistance genes has to be used for trait introgression into the breeding pool, prediction of the genetic varianceσ 2 g allows the breeder to identify the best crossing partner for this donor line from a set of different elite varieties. In addition, these predictions can also be used for improved resource allocation by investing more resources in terms of number of progeny into crosses with higher predicted genetic varianceσ 2 g . We plan further investigations in this area. In order to provide breeders with a fast and easy-touse tool to implement the presented approach in their breeding pipelines, routines for data pre-processing, estimation of marker effects and cross prediction with the formulas of Osthushenrich et al. (2017) have been included in the software package SelectionTools. SelectionTools allows breeders to make use of the advantages of cross prediction in a convenient way without the need of comprehensive mathematical and programming skills. With standard data formats, the presented approach can be reproduced with only a few lines of R code (Figure 5).

CONCLUSION
The analytical approach of Osthushenrich et al. (2017) yields plausible cross predictions which allow breeders to establish a ranking of potential crosses and identify a superior fraction of crosses for field evaluation. The approach is versatile and can be used for arbitrary mating systems. A major advantage of the presented approach is that it can be directly and easily used with marker effects from genome-wide prediction without timeconsuming additional calculations or simulations. The prediction accuracy of means and variances is sufficiently high for practical application to derive meaningful predictions even with limited phenotypic data. We therefore expect that the formulas are applicable in a wide range of breeding programs.

AVAILABILITY OF DATA AND MATERIAL
The datasets generated and/or analyzed during the current study are not publicly available due to the confidential genotypic data of the donor lines from an ongoing research project but are available from the corresponding author on reasonable request.