Genomic Prediction Accounting for Residual Heteroskedasticity

Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.

, thereby raising concern about the implications of the residual homoskedasticity assumption almost universally assumed in current WGP models (Gianola and Rosa 2015). Indeed, the use of heteroskedastic models for genetic evaluations dates back to work by Foulley and Gianola (1996), who first modeled the logarithm of residual variances as a linear function of fixed effects. SanCristobal-Gaudy et al. (2001) presented further extensions to incorporate random effects, including correlated genetic effects, followed by a unifying framework for the structural modeling of heterogeneous variances proposed by Kizilkaya and Tempelman (2005). With the advent of genomic selection, Yang et al. (2011) was among the first to propose a WGP model with heterogeneous residual variances for livestock populations, though only a genetic component was specified on both mean and residual variance.
Environmentally-driven heteroskedasticity has been shown to have practical implications for the prediction of genetic merit. Hill (1984) demonstrated that proportionally more individuals were likely to be selected from more variable groups if substantial heteroskedasticity was ignored using homoskedastic error models, especially if selection pressure was intense. Early attempts to remedy this problem included the preadjustment of phenotypes, e.g., by centering and scaling (Hill 1984). More modern approaches include explicit specification and modeling of sources of heteroskedasticity. For instance, Kizilkaya and Tempelman (2005) showed improved precision in estimated sire genetic merit for a birth weight trait when residual variance was specified as a function of sex and herds, finding that estimates of residual variances differed by as much as 20 times across herds.
The objectives of this study were 1) to extend classical parametric WGP models, specifically RR-BLUP, BayesA, BayesB and BayesCp, to explicitly account for residual heteroskedasticity, and 2) to assess potential gains in prediction accuracy by explicit modeling of residual variances as a function of various environmental or management factors in simulated and actual livestock performance data.
We first introduce extensions to classical WGP models to accommodate heteroskedasticity, including a delineation of the criteria used for model performance. We then describe and present comparisons based on a simulation study and also an application to carcass traits data from pigs.

Classical WGP models
The classical base WGP model expresses the phenotype y i of an individual i (i = 1, 2,. . .,n) as a linear regression function of SNP marker effects, as follows: where b is a vector of unknown fixed effects connected to the phenotypes via a known design vector x ' i ; g j is the unknown random effect of SNP marker j = 1, 2,. . ., p connected to y i via a known genotype z ij coded as 0, 1 or 2 to represent the dosages of the minor allele, and e i is the residual for animal i. Most current WGP models assume e ¼ fe i g n i¼1 $ iid Nð0; Is 2 e Þ. Differences between WGP models RR-BLUP, BayesA, BayesB and BayesCp are based upon the specification of the distribution of g ¼ fg j g p j¼1 . For RR-BLUP, g j $ iid Nð0; s 2 g Þ"j, whereas for BayesA g j j n g ; s 2 g $ iid t ng ð0; s 2 g Þ, i.e., independently and identically distributed scaled Student-t with common degrees of freedom n g and scale parameter s 2 g . This prior specification on BayesA is marginally equivalent to g j js 2 g j $ iid Nð0; s 2 g j Þ such that s 2 g j n g ; s 2 g $ x À2 ðn g ; n g s 2 g Þ with Eðs 2 g j n g ; s 2 g Þ ¼ ng ng 2 2 s 2 g (de los Yang and Tempelman 2012). BayesB further extends BayesA and specifies the distribution of g j j n g ; s 2 g as a mixture of t ng ð0; s 2 g Þ with probability ð1 2 pÞand a point mass at 0 with probability p (Meuwissen et al. 2001). BayesCp is a particular special case of BayesB with n g /Nsuch that the nonzero component of the mixture is Nð0; s 2 g Þ (Habier et al. 2011).
Heteroskedastic extension of WGP models Following Kizilkaya and Tempelman (2005), we extend WGP models to flexibly model the residual variance s 2 e as a multiplicative function of both systematic and nonsystematic environmental components, thereby explicitly accounting for heteroskedasticity. Expressed in the natural logarithmic scale, this is equivalent to writing the following where s 2 ei is the residual variance corresponding to the environmental or management circumstances for individual i; t is a s · 1 vector of unknown fixed effect parameters connected to s 2 ei via a known design vector w ' i ; and similarly, v is a t · 1 vector of unknown random effects connected to s 2 ei via a known design vector m ' i . Random effects on the residual variance may include environmental effects (i.e., contemporary groups), genetic effects or both. A priori, elements of v ¼ fv l g t l¼1 can be assumed independently distributed as inverted gamma IGða n ; a n À 1Þ, a n .2 such that Eðv l Þ ¼ 1 and s 2 v ¼ Varðv l Þ ¼ 1=ða v À 2Þ. As a result, variation across v l can be characterized by defining a coefficient of variation 2 . So specified, the magnitude of the heteroskedasticity across levels of the random effect factor v diminishes (i.e., s 2 v /0) with larger values of a v (i.e., a v /N) (Kizilkaya and Tempelman 2005).

Prior specifications
We specify a flat prior on b such that pðbÞ}constant. Here,s 2 e in the homoskedastic error model, as well as elements of t in Equation (2), were specified with noninformative priors x À2 ð-1; 0Þ (Gelman 2006). The hyperparameter a n was assigned the vague, though proper, prior density pða n Þ}ð1 þ a n Þ 22 , which is commonly used for strictly positive parameters (Kizilkaya and Tempelman 2005). As previously shown by Albert (1988), this prior defines a uniform prior density U(0,1) on notes the probability density function. For RR-BLUP and BayesCp, we specify s 2 g $ x À2 ð-1; 0Þ whereas for BayesA and BayesB, we specify s 2 g j $ x À2 ðn g ; n g s 2 g Þ with n g $ pðn g Þ}ð1 þ n g Þ -2 and s 2 g $ x À2 ð-1; 0Þ. Finally, for BayesB and BayesCp, p is assigned a Beta(10,1) prior to reflect a relatively weak assumption that most markers have null effects for any given trait.

Simulation study
We compared the performance of classical WGP models, namely RR-BLUP, BayesA, BayesB and BayesCp, to that of their heteroskedastic error counterparts using a simulation study.
Ten data set replicates were each generated from base populations of 150 unrelated individuals subjected to random mating for 6000 generations. Population size was kept constant until generation 6000, after which it was expanded 10 times to 1500 individuals. The genome was composed of three chromosomes, each of length 1 Morgan, and each containing a total of 10,000 equally-spaced monomorphic loci. The number of crossover events per meiosis was simulated from a Poisson distribution with mean 1 and the location of crossover was assumed uniformly distributed in a chromosome. The mutation rate for all loci was specified to be 2:5 · 10 24 per locus per generation and to be recurrent so as to ensure biallelic loci.
In Generation 6001, loci with minor allele frequency (MAF) , 0.1 or loci that failed to meet Hardy-Weinberg equilibrium based on an exact test (Wigginton et al. 2005) at a significant level of 0.0001 were discarded. For each dataset replicate, 60 loci were randomly selected to serve as quantitative trait loci (QTL), and an additional 3000 different loci were randomly selected to serve as SNP markers. For each of the 60 QTL, an allelic substitution effect a k (k = 1, 2,. . ., 60) was drawn from a t 5 (0,0.005), i.e., a Student-t distribution with 0 mean and a scale of 0.005 based on five degrees of freedom. Our choice of a heavy-tailed distribution for the QTL effects is consistent with current notions of the genetic architecture of quantitative traits in livestock population, by which traits seem to be controlled by many genes of small effect and few of large effects Goddard et al. 2009). The total additive genetic variance s 2 a was constructed from the weighted sum of genetic variances across the QTL effects, namely s 2 a ¼ 2 P 60 k¼1 q k ð1 2 q k Þa 2 k , where q k is the MAF at QTL k. The true breeding value (TBV) for an individual i was obtained as the aggregated allelic substitution effects a k over the selected 60 QTL loci, each weighted by its corresponding allelic dosage z ik , such that TBV i ¼ P 60 k¼1 z ik a k . Trait heritability was set at h 2 ¼ 0:4. Within each data replicate, we considered five different simulation scenarios reflecting various degrees of residual heteroskedasticity. That is, the replicated datasets described in the previous paragraph were used as blocking factors to compare scenarios across a heteroskedastic error gradient. Simulation scenarios included the case of homoskedastic residuals whereby t 1 ¼ t 2 ¼ s 2 e and v l ¼ 1 for all l =1, 2,. . ., 50 levels of a random effects factor, such that s 2 ei ¼ s 2 e "i. In this study, specification of a n /Nrepresents the homoskedastic error scenario, as s 2 v /0. In turn, other scenarios were defined by increasing levels of residual heteroskedasticity; i.e., a n = 50, 12, 5, and 3, such that the standard deviations s v ¼ q and 1, respectively. In addition, all heteroskedastic error scenarios (i.e., a n = 50, 12, 5, and 3) further incorporated systematic sources of heterogeneity whereby t 1 ¼ 0:8 Á s 2Ã e and t 2 ¼ s 2Ã e , where s 2Ã e is a "fixed" reference residual variance. For data generation, the residual e i was sampled from Nð0; s 2 ei Þ where s 2 ei was obtained as a function of t and v, as described in Equation (2). The phenotypic observation for individual i was generated as y i ¼ m þ TBV i þ e i , with m ¼ 3 set as a common mean for all observations. Observations from Generation 6001 were used as a training set to fit the competing WGP models and to estimate SNP effects. For each simulated dataset, individuals from Generation 6001 were randomly mated to produce Generation 6002 consisting of additional 1500 animals. Genotypes and TBV from individuals in Generation 6002 were generated to be used for validation in the simulation study. The average level of linkage disequilibrium (LD) between adjacent markers in the simulation study ranged between 0.23 to 0.25 across all replicated datasets, to represent current livestock populations (Meuwissen et al. 2001;Calus et al. 2008;Hayes et al. 2009;Yang et al. 2011).
Each replicated dataset was fitted using homoskedastic and heteroskedastic error versions of the selected WGP models, namely RR-BLUP, BayesA, BayesB and BayesCp. Programming code needed to implement these models is available in Supporting Information, File S1. Across models, Markov Chain Monte Carlo (MCMC) was implemented with burn-in lengths of 10,000 to 35,000, followed by subsequent saving of the next 140,000 to 480,000 cycles, depending on the WGP models and diagnostics as described subsequently.
Application to MSU swine resource population data Data corresponding to a three-generation Duroc · Pietrain swine resource population developed at Michigan State University (MSU) was used in this study. A detailed description of the dataset is available in Edwards et al. (2008aEdwards et al. ( , 2008b. Briefly, a total of 19 F 0 , 55 F 1 and 928 F 2 pigs were included in the pedigree. All F 0 and F 1 animals as well as 336 F 2 animals were genotyped using the commercial Illumina PorcineSNP60 beadchip (GeneSeek a Neogen Co., Lincoln, NE) panel with a total of 62,163 SNP markers (Gualdron Duarte et al. 2014). Markers with more than 10% missing data, unknown physical positions, or with MAF , 0.01 were removed from further analyses. Quality control procedures followed those described in Badke et al. (2012). Genotypes for the remaining 592 F 2 animals were obtained using a low-density panel of 9K tagSNP set referred to as the GeneSeek Genomic Profiler for Porcine LD (GGP-Porcine LD, GeneSeek a Neogen Company) consisting of a subset of the PorcineSNP60 panel. The F 2 animals genotyped with the 9K low density panel were imputed to 60K with imputation accuracy of approximately 0.99, as previously described (Gualdron Duarte et al. 2013). From the 60K SNP, a subset of 6210 markers was selected for this study. The selected SNP subset matched the panel of 10K tagSNP previously described by Badke et al. (2014). Phenotypes corresponding to 29 growth traits and 25 carcass and meat quality traits were obtained for F 2 animals, as described by Edwards et al. (2008aEdwards et al. ( , 2008b. Traits were subjected to preliminary screening for heterogeneous residual variances using standard linear mixed models and approximately 80% of the traits showed some degree of residual heteroskedasticity. Two traits, namely carcass temperature at 45 min postmortem, and loin muscle pH at 45 min postmortem, were selected for further consideration based on potentially high and mild levels of heteroskedasticity, respectively, and were thus subjected to follow-up WGP analysis (see next section). Phenotypes for 921 and 908 F 2 individuals were available for 45 min postmortem carcass temperature and for loin muscle pH, respectively. Phenotypes of the selected traits, genotypes and pedigree of the available animals were contained in the Supporting Information, File S2.
Each of the two selected traits were fitted using RR-BLUP, BayesA, BayesB and BayesCp WGP models in both their homoskedastic and heteroskedastic error specifications. For both traits, bincluded the fixed effect of sex and a regression coefficient on carcass weight. The general WGP model in Equation (1) was further expanded to incorporate clustering effects of slaughter dates d ¼ fd q g 33 q¼1 $ Nð0; s 2 d Þas well as polygenic effects u $ Nð0; As 2 u Þ, where A is a known pedigree-based additive relationship matrix. Therefore, in our data application, the genomic expected breeding value (GEBV) for individual i (i = 1,2,. . .,n) was defined as P p j¼1 z ijĝ j þû i . We modeled heterogeneous residual variances as presented in Equation (2), with t and v ¼ fv l g 33 l¼1 specifying the fixed effects of sex and the random clustering effects of slaughter dates, respectively. Thus, the hyperparameter a n in v l $ IGða n ; a n À 1Þ reflects the magnitude of residual heteroskedasticity in the responses of interest due to slaughter dates clusters.
Prior specifications were similar to those described for the simulation study, with the following exceptions due to problems with parameter identifiability. For BayesCp, the prior hyperparameter n g was set at n g = 3 for both traits to maximize prior uncertainty while retaining a defined mean (i.e., n g . 2). Instead, the prior scale s 2 g for BayesCp was specified as s 2 g ¼ 4:95 · 10 27 for carcass temperature and s 2 g ¼2:01 · 10 28 for loin muscle pH, based on the posterior medians of s 2 g obtained in BayesA. For BayesB, the hyperparameter s 2 g was assumed known and set at 6:35 · 10 27 for carcass temperature and 2:64 · 10 28 for loin muscle pH, whereby these values were obtained based on the posterior median of s 2 g under BayesCp with a homoskedastic error assumption. Sensitivity analyses were conducted to assess the influence of specifying s 2 g on posterior inference of interest. Also due to parameter identifiability issues, the variance s 2 u of the polygenic effects was first estimated from traditional (i.e., non-WGP) animal models that either assumed residual homo-or heteroskedasticity. These estimates of s 2 u under homo-and heteroskedastic assumptions were then specified as known constants when fitting homo-and heteroskedastic WGP models, respectively. Homoskedastic and heteroskedastic error specifications of the selected WGP models were fitted to each trait. In every case, a total of 20 parallel MCMC chains were run, each consisting of 12,000 to 27,000 burn-in cycles followed by 6,000 to 14,000 saved cycles. Post burn-in samples from the 20 parallel MCMC chains run on a given model can be considered samples from the joint posterior density of interest, and were thus combined for inference. Initial values of hyperparameters in each parallel chain were dispersed by an arbitrary small value while constraining them to fall within their allowable parameter space (Gelman and Rubin 1992). Posterior inference on parameters of interest was summarized for the overall dataset.

Model comparison
For each of the WGP models considered, namely RR-BLUP, BayesA, BayesB and BayesCp, the performance of the homoskedastic vs. its heteroskedastic error model counterparts was compared in both simulated data and real data using various criteria for model fit and for prediction, as follows.
First, we compared quality of global model fit using pseudo-Bayes factor (PBF) (Gelfand 1996), defined as the ratio of the conditional likelihood function under each heteroskedastic error WGP model over its homoskedastic counterpart, expressed in logarithmic scale of base 10, as follows: log 10 PBF HT;HO ¼ X n i¼1 log 10 Lðy i jy 2 i ; HTÞ À X n i¼1 log 10 Lðy i jy 2 i ; HOÞ where the abbreviation HT and HO hereafter refer to the candidate heteroskedastic and homoskedastic models, respectively. Moreover, Lð Á Þ denotes the likelihood function of observation y i conditional on all remaining observations fitted with the corresponding WGP model. This conditional likelihood, also known as the conditional predictive ordinate (Gelfand 1996) for observation i, can be approximated by , where B is the number of post burn-in MCMC iterations; u ðbÞ represents the posterior sample for model parameters u after b iterations post burn-in (b =1,2,. . .,B). A positive value of log 10 PBF HT;HO indicates support for the heteroskedastic error model based on enhanced fit to the data relative to its homoskedastic error WGP counterpart, thereby indicating evidence for heterogeneity of residual variances.
We further compared predictive performance of breeding values between competing homoskedastic and heteroskedastic error alternatives of each WGP model. For simulated data, we assess genomic prediction accuracy using the Pearson correlation between TBV in the simulated validation set and corresponding estimates GEBV i ¼ P p j¼1 z ijĝ j , wherebyĝ j were obtained by fitting the WGP model to the simulated training set. Within each WGP model, we compared homoskedastic vs. heteroskedastic error specifications across the various scenarios using a multifactorial ANOVA on genomic prediction accuracy, with the simulated replicated dataset as a random blocking factor.
For the real data application, predictive performance was assessed using a five-fold cross-validation (Daetwyler et al. 2013), whereby animals within each slaughter dates cluster were randomly assigned to five nonoverlapping data partitions or folds of nearly equal size (175-191 animals). Each one of the five data folds was assigned to be a validation set exactly once. When a data fold was selected as a validation set, phenotypes of this particular fold were excluded from estimation of marker effects. Instead, phenotypes of the validation fold were predicted using estimates of SNP markers, polygenic and nongenetic effects obtained from fitting a model to the remaining data folds, referred to here at training folds. This procedure was repeated until each of the five data folds had served as a validation set once. Consequently, every phenotyped animal was excluded from estimation of marker effects once, in which case their phenotypes were predicted using estimates obtained from animals in corresponding training folds (Meuwissen et al. 2013). We defined cross-validation predictive ability as the Pearson correlation coefficient between observed phenotypes in the validation fold, and the corresponding predicted phenotypes from parameter estimates obtained from the training folds. That is, rðy i ;ŷ i Þ, where y i andŷ i are the observed and predicted phenotypes, respectively, for animal i in the validation fold. The predicted phenotypesŷ i included estimated marker effects (weighted by their allelic frequencies) and estimated polygenic effects, as well as the estimated fixed effects of sex and carcass weight, and the random blocking effects of slaughter dates.
We also characterized potential practical implications of heteroskedasticity in the context of breeding decisions based on WGP. More specifically, we computed the Spearman's rank correlation coefficient between GEBV from homoskedastic vs. heteroskedastic error specifications for the top and bottom 10% ranked individuals. Relative ranking of top and bottom 10% individuals was assessed by fitting a linear mixed model to the estimated Spearman rank correlations obtained from data replicates, and testing for differences between simulation scenarios. For real data, rank correlations of GEBV for top and bottom 10% individuals in the validation sets were compared between homoskedastic and heteroskedastic WGP models.

MCMC diagnostics
Convergence diagnostics were implemented using the R package CODA (Plummer et al. 2006). We monitored convergence using trace plots. Diagnostic tests by Raftery and Lewis (1992) and by Heidelberger and Welch (1983) were conducted on the simulation study. For the data application, the Gelman and Rubin's diagnostic on multiple MCMC chains produced a shrinkage factor , 1.2 (Colosimo and Del Castillo 2007). We also determined effective sample size (ESS) for key hyperparameters (Kass et al. 1998). In each case, the number of MCMC cycles was adjusted to ensure that the ESS was greater than 100 for all hyperparameters.
Data availability Data and R code are available in File S1 and File S2 (refer to "README_ data.txt" and "README_mcmc.txt", respectively, for details).

Simulation study
For each of the WGP models considered in this study, Figure 1 shows comparisons of global fit, expressed as log 10 PBF HT;HO , between homoskedastic and heteroskedastic model specifications for scenarios reflecting a gradient of increasing residual heteroskedasticity. Recall that positive values of the log 10 PBF HT;HO indicate support for the heteroskedastic, as opposed to the homoskedastic error version of the corresponding WGP model. When residual heteroskedasticity was high (a n = 3 or 5), log 10 PBF HT;HO was estimated to be between 12.8 and 77.3 across MC replicates fitted with any of the WGP models. This supports a strong advantage in global fit for heteroskedastic, rather than homoskedastic, error specifications, regardless of the specific WGP model. As the amount of residual heteroskedasticity decreased (a n = 12), so did the values of log 10 PBF HT;HO and thus the relative advantage of the heteroskedastic error WGP model over its homoskedastic error counterpart. Under scenarios of low heteroskedasticity (a n = 50), or of homogeneous residual variance (a n / N), the values of log 10 PBF HT;HO under RR-BLUP and BayesA were closer to zero, thus indicating no apparent advantage of heteroskedastic WGP models over their homoskedastic error counterparts; in turn, BayesB and BayesCp showed greater uncertainty in these conditions. Overall, we note that, when the amount of residual heteroskedasticity in the data was high (a n = 3 or 5), log 10 PBF HT;HO consistently selected the appropriate heteroskedastic error specification for all WGP models; however, as mentioned above, the discriminatory capability of log 10 PBF HT;HO to detect random sources of residual heteroskedasticity was partially, though incrementally, impaired when heteroskedasticity was moderate (a n = 12) or low (a n = 50). Table 1 presents a summary of the posterior inference on the hyperparameter a n defining the degree of heterogeneity of residual variance across levels of the random effect factor for 10 MC replicates under each of the simulation scenarios fitted with the heteroskedastic WGP models. Coverage probabilities for a n , defined as proportion MC replicates for which the 95% highest posterior density (HPD) included the true parameter value, under WGP models RR-BLUP and BayesA were mutually identical at 100%, 90%, 100%,100% when true a n = 3, 5, 12, 50, respectively. In turn, coverage probabilities were 100%, 90%, 70%, 20% for BayesB and 100%, 90%, 80%, 20% for BayesCp, when true a n = 3, 5, 12, 50, respectively. As might be expected, inferential precision on a n was maximized when heteroskedasticity was high, as indicated by the smallest difference between minimum and maximum values of the lower and upper boundaries of the 95% HPD on a n ( Table 1). The reverse was also true, as inference on a n was most uncertain when heteroskedasticity was not present. That is, enhanced model fit of a heteroskedastic WGP model relative to its homoskedastic counterpart under conditions of heterogeneous variances may be explained by increased precision of inference on the hyperparametera n , and vice versa. These results on posterior inference of the heteroskedasticity parameter a n are consistent with those presented for overall goodness of fit in Figure 1.
To validate inferential performance on the fixed effect parameters t specified on the residual variance, we considered the posterior density of the ratio of t 1 over t 2 . Coverage probability of the 95% HPD for the true value of the parameter ratio under heteroskedastic WGP models was 92% for both BayesB and BayesCp, and 94% for both RR-BLUP and BayesA across simulation scenarios. In all cases, the observed coverage was within probabilistic expectation.
Estimated genomic prediction accuracies (and corresponding standard errors) of heteroskedastic and homoskedastic error versions of WGP models are shown in Figure 2. For all RR-BLUP, BayesA, BayesB and BayesCp models, the heteroskedastic specification showed a gain on genomic prediction accuracy relative to the homoskedastic WGP counterpart whenever the amount of residual heteroskedasticity in the data were high (i.e., a n = 3 or 5, P , 0.001 in all cases). However, no evidence for any predictive advantage of heteroskedastic WGP specifications was apparent if the data had been generated under conditions of low or null heteroskedasticity (i.e., a n = 50 or N; P . 0.30 in all cases). For situations of moderate heteroskedasticity (i.e., a n = 12) fitted with Figure 1 Assessment of global fit, expressed as log10 pseudo-Bayes factor (PBF), between heteroskedastic and homoskedastic whole-genome prediction (WGP) models, namely RR-BLUP, BayesA, BayesB, and BayesCp, for 10 replicated data sets from each of five simulated scenarios defined by either residual homoskedasticity (a v / N) or a gradient of residual heteroskedasticity ranging from high (a v = 3, 5) to moderate (a v = 12) to low (a v = 50). A horizontal reference line is provided at zero. RR-BLUP or BayesA WGP models, the heteroskedastic error specification yielded greater (P , 0.05) genomic predictive accuracy than its homoskedastic counterpart but this difference was not apparent using variable selection models like BayesB or BayesCp. Despite the significant increase in genomic prediction accuracy by heteroskedastic WGP models when the data were highly heteroskedastic, we note that the gain in accuracy relative to the homoskedastic specification was of a relatively small magnitude (i.e., range from 0.009 to 0.018 for a n = 3 and from 0.005 to 0.008 for a n = 5 across MC replicated data sets).
To further characterize potential practical implications of heteroskedastic vs. homoskedastic WGP models in the context of breeding programs, we explored differences in the ranking of individuals of extreme genetic merit. We computed the Spearman correlation of the top 10% individuals whose GEBV had been estimated from homoskedastic and heteroskedastic WGP models across the simulated gradient of residual heteroskedasticity ( Figure  3). Results on the bottom 10% ranked individuals showed a similar pattern and are thus not discussed further. For homoskedastic scenarios (a n / N) or scenarios of low heteroskedasticity (a n = 50), the Spearman rank correlation between homoskedastic-and heteroskedastic-based GEBVs from RR-BLUP, BayesA, BayesB or BayesCp WGP models for the top 10% animals was close to 1, thus indicating minor concerns for selection purposes. However, as the amount of heteroskedasticity increased, the Spearman correlation between heteroskedastic-based GEBV and homoskedasticassuming GEBV for the top 10% individuals decreased to an estimated value of 0.85. This result suggests nonnegligible reranking of top individuals for selection purposes. Given response to selection, this finding could potentially have direct implications for breeding programs despite the small magnitude of the overall gain on genomic prediction accuracy described before.

MSU swine resource population data
For carcass temperature at 45 min postmortem, the variance s 2 u of polygenic effects were estimated at 0.022 and 0.036 for homoskedastic and heteroskedastic error specifications, respectively. For loin muscle pH at 45 min, the corresponding estimates of s 2 u were 0.006 and 0.005, respectively.
We first assessed evidence for residual heteroskedasticity on the traits carcass temperature and loin muscle pH 45 min postmortem selected for this study from the MSU resource population. Table 2 summarizes the posterior inference for a n in the heteroskedastic specification of WGP models on the complete dataset. Recall that the hyperparameter a n defines the magnitude of non-systematic heterogeneity of residual variances for each of the selected traits as a function of slaughter dates clusters. For carcass temperature at 45 min postmortem, the posterior mean of a n was smaller than two in all cases. Similarly, the magnitude of the upper boundaries of the 95% HPD on a n did not exceed three n Median of the posterior mean of a v ( e a v ) [as well as minimum and maximum values for the respective lower and upper boundaries (minL, maxU) of the 95% highest posterior density intervals of the posterior distribution of a v ] based on 10 Monte Carlo replicates across simulation scenarios consisting of a gradient of increasing heteroskedasticity. Simulated data were fitted with heteroskedastic specifications of whole-genome prediction models, namely RR-BLUP, BayesA, BayesB, and BayesCp.

Figure 2
Genomic prediction accuracy (least square mean estimates and 95% confidence intervals) for heteroskedastic (gray) and homoskedastic (white) specifications of WGP models considered in this study, namely RR-BLUP, BayesA, BayesB, and BayesCp, under simulation scenarios defined by either residual homoskedasticity (a v / N) or a gradient of residual heteroskedasticity ranging from high (a v = 3, 5) to moderate (a v = 12) to low (a v = 50). Genomic prediction accuracy was defined as the Pearson correlation coefficient between true breeding value and expected breeding value. ÃÃ and Ã indicate differences between heteroskedastic and homoskedastic versions of each WGP model at P = 0.001 and P = 0.05, respectively. under any of the heteroskedastic WGP models considered, thereby providing evidence for strong cluster-based heterogeneity of residual variances for this trait. In fact, the range of posterior means of slaughter date-specific v l for carcass temperature at 45 min postmortem was greater than eight-fold, thereby indicating that the residual variance for some slaughter dates clusters was estimated to be as large as eight times greater than the residual variance in other slaughter clusters. In turn, for loin muscle pH at 45 min postmortem, the posterior mean of a n was below six, and the corresponding upper boundaries of its 95% HPD was approximately 10 in all cases, thus indicating milder clusterdriven residual heteroskedasticity for this trait, whereby the range of posterior means of v l was close to three-fold across all 33 slaughter dates clusters. These results are consistent with our findings during preliminary data screening.
We also explored the effect of sex as an additional source of heteroskedasticity on the selected traits, as represented by parameter t in Equation (2). Based on the set-to-zero parameterization implemented in this study, the parameter t may be interpreted as a ratio of female-to-male residual variances, whereby a ratio of one indicates homogeneous residual variances for both sexes. For carcass temperature at 45 min postmortem, the 95% HPD of t fitted with any of the WGP models ranged from a lower boundary of 0.7 to an upper boundary of 1.2. For loin muscle pH, the corresponding 95% HPD range was 0.9 to 1.5. It then follows no evidence for sex-based heteroskedasticity of either trait regardless of WGP model. Next, we consider relative global fit of homoskedastic vs. heteroskedastic error WGP models to the actual data using PBF (Gelfand 1996), and use a threshold value of log 10 PBF HT;HO ¼ 2 to conclude upon a decisive difference in fit between models (Kass and Raftery 1995). For carcass temperature at 45 min postmortem, the range of log 10 PBF HT;HO across the five cross-validation folds was [45.7, 81.3] for [46.9,81.0] for BayesA, [44.3, 77.5] for BayesB,and [43.8,76.9] for BayesCp WGP models. In turn, for loin muscle pH at 45 min postmortem, the range of log 10 PBF HT;HO was [9.7,14.4],[9.8,14.6],[8.8,13.7] and [8.6,13.8]. These results favor the use of heteroskedastic WGP error models for both traits, and under all WGP specifications considered here. The larger magnitude of log 10 PBF HT;HO , and hence greater evidence of residual heteroskedasticity for carcass temperature relative to loin muscle pH, is consistent both with our preliminary screening and with our posterior inference on a n as described previously.
changes in the specification of s 2 g were compensated with changes in the estimates of the proportion of SNP markers with nonzero effects (Yang et al. 2015). In turn, the estimated median cross-validation prediction accuracies for carcass temperature, and its corresponding standard deviation, across cross-validation folds at any choice of s 2 g were 0.86 6 0.05 for homoskedastic error BayesCp or BayesB and 0.86 6 0.04 for heteroskedastic error BayesCp or BayesB. For loin muscle pH, cross-validation prediction accuracy based on homoskedastic error BayesCp or BayesB ranged from 0.31 6 0.06 to 0.29 6 0.07 across prior specifications of s 2 g . For heteroskedastic error BayesCp or BayesB, cross-validation predictive ability ranged from 0.32 6 0.07 to 0.29 6 0.07 across values of s 2 g . Overall, sensitivity analyses assessment indicated little reason to be concerned about specification of hyperparameters for the purpose of prediction accuracy. Figure 4 depicts estimated cross-validation predictive abilities across five folds for both carcass temperature and loin muscle pH at 45 min postmortem. Across WGP models, cross-validation predictive abilities for carcass temperature and loin muscle pH 45 min postmortem were estimated to be approximately 0.85 and 0.32, respectively. For neither trait did we find any evidence for differences in cross-validation predictive ability between homoskedastic vs. heteroskedastic specifications of any of the WGP models considered (P . 0.25 in all cases for either trait).
Often in animal breeding, greater interest is directed toward animals that exhibit extreme GEBV, as they are the ones likely to be selected as parents for the next generation. Table 3 reports Spearman's rank correlation coefficient between homoskedastic-error vs. heteroskedasticerror GEBV for animals of extreme genetic merit. For both traits, and regardless of WGP model, we observed considerable reranking of the top and bottom 10% individuals, particularly as the degree of residual variance heterogeneity in the data increased. In fact, for loin muscle pH at 45 min postmortem, the corresponding estimated median rank correlations ranged from 0.52 to 0.70 (top), and from 0.64 to 0.70 (bottom) across WGP models; in turn, for the more heteroskedastic trait (i.e., carcass temperature at 45 min postmortem), the median rank correlation of GEBV for top and bottom 10% animals ranged from 0.05 to 0.38 (top) and from 0.43 to 0.54 (bottom) across WGP models. Such variability in rankings of GEBV may be partially due to the relatively small sample size (i.e., only 10% animals within a cross-validation fold) used to estimate the rank correlation coefficient. This is further supported by the considerable variability observed among cross-validation folds in the reranking of individuals based on homoskedastic-based GEBV relative to their heteroskedastic counterpart, though this variability was particularly noticeable for carcass temperature. Again, this may be partially explained by the relatively larger magnitude of residual heteroskedasticity detected for this trait. We further note that reranking of extreme GEBV using homoskedastic vs. heteroskedastic errors seemed to be particularly extreme when the BayesB WGP specification was implemented.
For further illustration, we selected a cross-validation fold and depicted a scatterplot of homoskedastic-error vs. heteroskedastic-error GEBV for carcass temperature ( Figure 5A) and for loin muscle pH ( Figure 5B) under each WGP model. For both traits, individuals that showed extreme genetic merit under homoskedastic error assumptions had their GEBV considerably attenuated when residual heteroskedasticity was accounted for (e.g., Figure 5A, BayesB). In fact, an individual with extremely high GEBV inferred under a homoskedastic error model may not be considered as a viable selection candidate if its GEBV was estimated from a heteroskedastic error WGP model. This was indeed the case for the two individuals with top homoskedastic-based GEBV in the complete dataset. It is interesting to note that these top two individuals derived from one slaughter date cluster that had the largest posterior mean for the relative residual variance v l (data not shown). Conversely, candidate individuals with top or bottom genetic merit may be overlooked by using conventional homoskedastic error models (e.g., Figure 5B, BayesB).
Taken together, the observed reranking could have practical implications from a selection point of view. In support of this observation, we note that the mean of genomic breeding values in the top 10% individuals was between 1.2 · (based on BayesA) to 10 · (based on BayesCp) greater in magnitude when estimated based on the heteroskedastic WGP model relative to the homoskedastic specification for either trait. Similar results were observed for the bottom 10% individuals with extreme genetic merit based on the hetero-vs. homoskedastic WGP specification. It should be acknowledged, though, that these comparisons are conditioned on the models used to predict mean genomic breeding values.

DISCUSSION
In this study, we extend classical WGP models to account for potential heterogeneous residual variances across environments, and further assess whether explicit accounting for such heteroskedasticity may impact accuracy of prediction of genomic breeding values.
Environmental residual heteroskedasticity is a rather common phenomenon across agricultural environments in livestock production. For instance, residual variance estimates for birth weight in an Italian Piemontese cattle population differed by approximately 10-fold across herds (Kizilkaya and Tempelman 2005), and that of average daily gains in feedlot cattle from the US Midwest differed by more than 12-fold across contemporary groups (Cernicchiaro et al. 2013). Backfat thickness in pigs was shown to display considerable residual heteroskedasticity based on an animal model, whereby residual variances ranged by Figure 4 Estimated cross-validation predictive ability (and 95% confidence intervals) of genomic breeding values for (A) carcass temperature 45 min postmortem and (B) loin muscle pH 45 min postmortem fitted with heteroskedastic (gray bars) and homoskedastic (white bars) specifications of WGP models considered in this study, namely RR-BLUP, BayesA, BayesB, and BayesCp. Cross-validation predictive ability is represented by the Pearson correlation coefficient between observed phenotypes in the validation fold, and their corresponding predictions based on estimates from the training folds in a five-fold cross-validation study.
approximately eight-fold across herds (See 1998). Similarly, in this study, we found evidence for considerable environmentally-driven heterogeneity of residual variances in other swine carcass traits, as indicated by the small magnitude of posterior means of a n for carcass temperature and loin muscle pH. These results indicate considerable departure from the residual homoskedasticity assumption commonly invoked by standard WGP models. Gianola and Rosa (2015) adverted that modeling heterogeneous residual variances across environments was likely to be important for reliable genomic selection, as further supported by our results. Unaccounted-for heteroskedastic errors can potentially impact breeding decisions as animals from the most diverse environments might then be disproportionally selected (Hill 1984). Indeed, previous studies have shown nonnegligible reranking of top and bottom 10% progenitors when heterogeneity of residual variances across environment or management groups is properly modeled (Cardoso et al. 2005;Kizilkaya and Tempelman 2005). In fact, incorrectly assuming homogeneous residual variances could cause a substantial reduction in selection efficiency, particularly under conditions of low heritability (Garrick and VanVleck 1987). Heritabilities of most technological quality traits of meat in swine, such as the ones evaluated in this study, has been reported to range from low to moderate, as the average value for many studies fall into the range 0.10-0.30 (reviewed by Sellier andMonin 1994 andCiobanu et al. 2011).
Based on this evidence, we extended homoskedastic WGP models to allow for environmental heterogeneity of residual variances, and evaluated the relative performance of heteroskedastic and homoskedastic error WGP models in terms of both global fit and predictive performance. Our simulation study showed considerable improvements in global model fit when extreme residual heteroskedasticity was properly accounted for, though the advantage of heteroskedastic error WGP models seemed to dissipate quickly for even moderate amounts of environmental heterogeneity in residual variances, particularly under WGP models without variable selection (i.e., RR-BLUP and BayesA). Furthermore, the observed advantage of heteroskedastic error WGP models in global fit translated into very small ($1-2%), albeit significant, gains in genomic prediction accuracy under conditions of extreme data heteroskedasticity. As the amount of residual heteroskedasticity decreased, so did the power to detect differences in genomic prediction accuracy between heteroskedastic and homoskedastic model specification. This was particularly noticeable for WGP models with variable selection (i.e., BayesB or BayesCp) relative to those without variable selection (i.e., RR-BLUP and BayesA). For the specific data application used in this study corresponding to the MSU swine resource population, there was no evidence of any gains in cross-validation predictive ability for selected carcass traits when heterogeneous residual variances across environments were explicitly modeled. This finding was somewhat unexpected given the extreme level of environmental heterogeneity of residual variances observed in at least one of those traits. The high level of environmental heteroskedasticity in the carcass temperature trait was recognized both by posterior inference on the hyperparameter a v and by improved global model fit of the heteroskedastic error model relative to the homoskedastic error model. Yet, it is possible that additional gains in prediction accuracy from specifying heteroskedasticity, either of environmental or genetic origin, may be difficult to observe due to the already large magnitude of "baseline" cross-validation predictive ability for this trait ($0.85) based on standard homoskedastic error WGP models.
It is unclear whether a genetic component might have contributed to the high level of residual heteroskedasticity observed in the carcass temperature trait. A recent study by Yang et al. (2011) explored the use of parametric genomic models that specify genetic control of environmental variance in a swine production system. In particular, classical WGP models were extended to assess putative marker effects not only on GEBV but also on environmental variability. Consistent with our results, that study indicated enhanced fit of heteroskedastic error models. However, the gains in accuracy of prediction were either of small magnitude in simulated data or not at all apparent when applied to back fat thickness data in pigs, as was also observed in our application. Additional statistical methods for detecting genetic loci affecting phenotypic variability were recently introduced; proposed approaches range from fully-parametric (Rönnegård and Valdar 2011;Yang et al. 2011) through classical nonparametric (Paré et al. 2010;Struchalin et al. 2010), including a two-stage semi-parametric approximation (Hill and Mulder 2010). Following a thorough review, Rönnegård and Valdar (2012) highlighted the inferential importance of simultaneous estimation of effects on mean and variance as a strength of parametric methods for modeling variance-controlling QTL.
Ideally, heterogeneous genetic and residual variances should be modeled simultaneously, for example with multiple breed studies (de Roos et al. 2009) or genotype by environment interaction (Edwards and Jannink 2006;Jarquin et al. 2014;Lopez Cruz et al. 2015) studies, both of which require the specification of heterogeneous genetic variances. A study by Chen et al. (2014) explored multi-population genomic prediction for milk production traits on two dairy breeds using a multitask Bayesian learning model. When error variances and marker effects variances were explicitly specified as heterogeneous across breeds (as opposed to assumed homogeneous), gains in prediction accuracy across traits ranged from 0.04 to 0.14 using a 50K SNP panel, and from 0.02 to 0.11. Also, a yield variety trial of 40 oat genotypes across 34 environments reported "stabilized" genetic predictions (i.e., higher repeatability) when genetic and environmental sources of heterogeneity were explicitly specified on residual and genotype-by-environment variance components (Edwards and Jannink 2006).
It is often the case that deregressed expected breeding values are weighted and used as response variables in WGP models (Garrick et al. 2009) instead of actual phenotypes. In a way, modeling of weighted n deregressed expected breeding values may be considered an approach to account of heterogeneous variance, in this case the variance of the expected breeding value. However, this approach may be considered ad hoc as its effectiveness depends on several factors such as the number of repeated measured observations, size of training data and the reliability of the breeding values (Garrick et al. 2009;Ostersen et al. 2011;Boddhireddy et al. 2014).
Despite the observed lack of any appreciable gain in overall accuracy of prediction of carcass trait phenotypes by heteroskedastic WGP models, the differential ranking of animals with the 10% most extreme genetic merit suggest important practical implications for the assumption of homogeneous residual variance. Substantial reranking was apparent for these candidate animals depending on whether environmental residual heteroskedasticity was explicitly accounted for in obtaining their breeding values. In fact, noticeable differences in the mean of genomic breeding values for individuals of extreme genetic merit were apparent based on heteroskedastic vs. homoskedastic WGP specifications. To reconcile these results, we notice that the magnitude of the difference in GEBV between most extreme individuals and the rest seems to be rather small, and thus difficult to detect, particularly given the relatively narrow range of GEBV observed for the selected traits in this population. In turn, a localized performance metric, such as the Spearman correlation coefficient among the top and bottom 10% individuals, could detect regional patterns that may not be necessarily apparent from overall performance metrics, such as cross-validation predictive ability. However, we acknowledge that, given the low-to-moderate heritabilities of the traits evaluated in this study, it is not possible to discard nonnegligible sampling variability on the estimates of Spearman correlation coefficients or other statistical reasons related to unstable behavior of correlations within extreme tails.
The very low magnitude of the posterior mean of a n observed in the swine data application indicates that residual variability is not homogeneous across environmental subclasses. However, extreme withincluster residuals, for example, due to preferential treatment, may be a reasonable concern even after accounting for residual heteroskedasticity. Biased prediction of breeding values is a problem often encountered under conditions of preferential treatment (Kuhn and Freeman 1995). Our observation of substantial reranking of extreme breeding values suggests that heteroskedastic WGP models may, at least partially, offset prediction bias due to preferential treatment. Yet, preferential treatment is a concern that further motivates the need to extend heteroskedastic error WGP models to allow for outlier robustness, as advocated by Gianola and Rosa (2015). In a nongenomic application, Cardoso et al. (2007) extended the univariate t-models proposed by Stranden and Gianola (1999) to attenuate adverse effects of preferential treatment and specified Student-t distributed residual heteroskedasticity across environments to potentially accommodate a more robust analysis capable of muting the influence of extreme observations on inferences of cluster specific residual variances. Given the breeding objective of ranking candidates for selection, a heavy-tailed residual distribution combined with explicit modeling of environmental heteroskedasticity is likely to yield more robust genomic predictions in the sense of reducing influence from outlying identifiable clusters and extreme individual datapoints.
Most recently, WGP models have been used to predict complex human traits, such as risk of disease and life expectancy Vazquez et al. 2012). One can surmise that environmentallydriven heteroskedasticity is likely present in this context as well, though the full extent of it remains unclear. Predictive performance of WGP models for human traits can be low, mostly due to factors unique to human populations, such as unrelatedness of individuals and short LD patterns (de los Campos et al. 2013b). Extending WGP models for complex human traits to explicitly model heterogeneous residual variances across environments could potentially help account for stillunexplained variance, and thus affect the extent of missing heritability in human populations.
Finally, WGP models often require estimation of a large number of SNP marker effects and considerations for computing efficiency become paramount, particularly since MCMC inference can be computationally expensive. Computational enhancements for homoskedastic WGP models have been developed based on expectation-maximization (EM) based algorithms (Hayashi and Iwata 2010), or analytically derived posterior densities of each marker effect (Meuwissen et al. 2009). Heteroskedastic error extensions to WGP models, such as those presented here, could also be further modified to enhance computational efficiency. For example, an EM-like algorithm such as that proposed by Gianola et al. (1992) may be adapted to obtain empirical Bayes estimates of environment-specific variances in a WGP context.

Conclusions
In this study, we describe extensions to classical whole-genome prediction models that incorporate modeling of heterogeneous residual variances across environments and evaluate potential impact of specifying heteroskedastic vs. homoskedastic error models on the accuracy of prediction of genomic breeding values. Heteroskedastic error models were overwhelmingly supported by improved global fit to the data. The advantages of heteroskedastic error WGP models on overall predictive ability of carcass traits in pigs was of small magnitude, if at all present; however, considerable reranking of individuals with extreme genetic merit was observed when heteroskedasticity was explicitly accounted for. Heteroskedastic error WGP modeling should be carefully considered in breeding programs as environmental residual heteroskedasticity seems prevalent and, if unaccounted, can have considerable practical implications for selection of individuals of extreme genetic merit. Additional work tackling simultaneous modeling of heterogeneous genetic variances jointly with heterogeneous error variances and potential outlier robustness extensions is warranted.

ACKNOWLEDGMENTS
Wenzhao Yang and the MSU Department of Animal Science are gratefully acknowledged for facilitating partial R code and for preparing the data used in this study. This project was supported by Agriculture and Food Research Initiative Competitive Grant no. 2011-67015-30338 from the U.S. Department of Agriculture National Institute of Food and Agriculture. Computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CNS-1006860, EPS-1006860, and EPS-0919443.