A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments.

applications are found in the machine learning community, and there is enough empirical evidence that the VBM tends to be faster than classic methods such as MCMC sampling. Also, this method is easier to scale to large data-it has been applied to problems such as large-scale document analysis, high-dimensional neuroscience, and computer vision, as well as in computational biology, robotics, speech recognition, marketing, optimal control, reinforcement learning, statistical network analysis, astrophysics, and the social sciences (Blei et al. 2016). However, the VBM has been studied less rigorously than MCMC, and its statistical properties are less understood (Blei et al. 2016).
Most VBMs applied in genomic-enabled prediction use a simple linear model of the form y i ¼ m þ P m j¼1 x ij b j þ e i ; where y i is the phenotype of the ith individual, m is the sample mean, x ij defines the genotype of the jth marker, and e i is an error term normally distributed with mean zero and variance s 2 e (Logsdon et al. 2010;Carbonetto and Stephens 2012;Yamamoto et al. 2017;Li and Sillanpää 2012;Teschendorff et al. 2005). Exceptions to the use of simple linear models in VBM were studied by Hayashi and Iwata (2013), who modeled multiple traits without effects of environments and without interaction terms; by Arakawa et al. (2016), who used a model for univariate responses with mixed effects without interaction terms; and in the research of Li and Sillanpää (2013), who used a model for functional regression without random effects and without interaction terms.
Based on the above, the purpose of this study was to propose a univariate genomic VBM that takes into account genotype · environment interaction (G·E), and could be an alternative to the existing Bayesian and non-Bayesian models with G·E described by Jarquín et al. (2014) in the reaction norm model. Since animal and plant breeding programs today perform genetic evaluations of large numbers of lines, or animals in many environments, for several traits and genotyped with thousands (or millions) of genetic markers, complex Bayesian models may take days or weeks to do this, and, as a result, the existing models become untenable.
For this reason, in order to retain all the useful features of the Bayesian genomic models with G·E at a reduced computational cost, we propose, within the general framework of variational Bayes, a method for estimating the univariate genomic model with G·E, which we call the variational Bayes multi-environment (VBME) model. This means that the VBME will recast the problem of computing posterior densities by the Gibbs sampler of the conventional Bayesian methods with G·E-referred to here as the Bayesian multi-environment (BME) model-as an optimization problem using the VBM, which consists of introducing a class of approximating distributions of the latent variables, then optimizing some criteria to find the distribution within this class that best matches the posterior (Carbonetto and Stephens 2012). Also, it is important to point out that the newly proposed method uses, as a prior for all standard deviations (SDs), a half-t distribution that is a high noninformative prior distribution (Huang and Wand 2013). We derived the full conditional and variational posterior distribution of the VBME model, explained its implementation, and applied it to eight experimental genomic (maize and wheat) data sets with two main objectives: (1) to compare the accuracy of the VBME with that of the standard BME model; and (2) to compare the computational processing time of VBME with that of BME.

MATERIALS AND METHODS
The variational Bayes method using the density transform approach The most common variants of the VBM are the density transform approach and the tangent transform approach. We will explain the basis of the density transform approach; readers interested in understanding the tangent transform approach can consult Ormerod and Wand (2010). The density transform approach consists of approximating the posterior densities through other densities for which inference is more tractable. In Bayesian statistics, the inference problem is to compute the conditional distribution of unknown parameters given the observations p(u│y) = p(y,u)/p(y); the denominator is known as the marginal distribution of the observations or model evidence, u2Q is the parameter vector, and pðy; uÞ is the joint likelihood of the data and model parameters. The essence of the density transform approach is to approximate the posterior density pðu=yÞ by a function qðuÞ, for which the q-dependent lower bound ½ pðy; qÞon the marginal likelihood is given by  (Ormerod and Wand 2010). To achieve tractability, p ðy; qÞ is maximized over a restricted, more manageable, q class of densities.
Here, for the implementation, we chose, as restrictions for the q class of densities, the mean field variational family restriction, where the latent variables are mutually independent, and each is governed by a distinct factor in the variational distribution, that is, qðuÞ factorizes into Q m j¼1 q j ðu j Þ for some partitions fu 1 ; . . . ; u m g of u: Each u j is governed by its own distribution q j ðu j Þ, and, in principle, each variational factor can take on any parametric form that is appropriate for the corresponding random variable. In the optimization process, these variational factors are chosen to maximize the evidence lower bound given in Equation (2). One advantage of using the mean field variational family restriction over qðuÞ is that it is possible to obtain explicit solutions for each product component in terms of the others when conjugate priors are used. These lead to an interactive scheme for obtaining simultaneous solutions. The solutions rely on result 2.1.
Result. 2.1. Let v and w be continuous random vectors with joint density pðv; wÞ: Then is attained by qpðvÞ ¼ pðvjwÞ: One disadvantage of the mean field variational family restriction is its tendency to underestimate the variability of parameter estimates; it is also not a good option when the parameters under study have a high degree of dependence, which causes degradation in the resulting inference. Conversely, if the posterior dependence between parameters is weak, then the product density restriction could lead to very accurate approximate inferences (Ormerod and Wand 2010;Blei et al. 2016). When conditional conjugate priors are used, given the other variational factor components, the optimal densities of each variational factor, q j ðu j Þ; are proportional to the exponentiated expected log of the full conditionals obtained with Gibbs sampling, that is, where pðu j ELSEÞ are the known full conditionals in the MCMC literature, and the expectations on the right-hand side of Equation (3) are computed with respect to the density q(u)/q(u j ). From this equation, an interactive scheme can be used to obtain the optimal densities. Equation (3) allows obtaining all the variational posteriors for implementing the VBM using mean variational approximation.

Statistical model
Since we wanted to develop a variational Bayes version of the BME model, we used y ijk to represent the normal phenotype from the k th replication of the j th line in the i th environment ði ¼ 1; 2; . . . I; j ¼ 1; 2; . . . ; J; k ¼ 1; 2; . . . ; KÞ; where K represents the number of replicates of each line in each environment under study. Therefore, the total number of observations is n ¼ I · J · K: Therefore, the BME model is where E i represents the effect of i th environment, g i represents the genomic effect of j th line, and is assumed as a random effect, gE ij is the interaction between the genomic effect of the j th line and the i th environment, and is assumed a random effect, and e ijk is a random error term associated with the k th replication of the j th line in the i th environment.
In matrix notation, the model given in Equation (4) including all the information is expressed as: where Y is the response vector of order n · J; X is the design matrix of environment effects of order n · I; b is the regression coefficient vector of order I · 1 associated with the environment effects, Z 1 is the design matrix of order n · J; associated with the random effects of lines, b 1 is the vector of random effects of lines of order J · 1; Z 2 is the design matrix of order n · IJ; associated with the interaction random effects between lines and environments, b 2 is the vector of interaction random effects of lines and environments of order IJ·1; and e is the vector of random error terms of order n·1. Then, we assumed that b 1 $ Nð0; Gs 2 1 Þ; b 2 $ Nð0; G 2 s 2 2 Þ, and e $ Nð0; I n s 2 e Þ; also that this vector is statistically independent, where G denotes the genomic relationship matrix, and was calculated as suggested by VanRaden (2008). G 2 ¼ I I 5G; 5 denotes a Kronecker product, where I I is an identity matrix of order I · I; which indicates that we are assuming independence between environments.

BME model
The joint posterior density of the parameter vector of the BME model is given by It is important to point out that the hierarchical priors given to the SDs induce Half-t priors to achieve arbitrary high noninformativity in this parameter (Huang and Wand 2013). Note that the use of this type of priors of our proposed model for genomic-enabled prediction is different than the priors used in the context of the existing Bayesian univariate models with G·E. Details of all full conditional distributions derived from the Gibbs sampler for implementing these BME and VBME models are given in Appendices A and B, whereas the R code for implementing the VBME model is provided in Appendix D.
Variational version of the BME model As an alternative to the Gibbs sampler for estimating the parameters, in this section, we will derive the optimal densities under the mean field variational family restriction, assuming the following factorization of qðuÞ: The difficult problem of finding the exact posterior (6) became less difficult, i.e., it now consists of finding an approximate parametric posterior qðb; b 1 ; b 2 ; s 2 1 ; s 2 2 ; s 2 b ; s 2 e ; a b ; a b1 ; a b2 ; a e Þ with moments (i.e., parameters). Inference then reduces to finding a density q that minimizes a measure of dissimilarity between q and p: This can be achieved by maximizing the log of the evidence lower bound that is an approximation of the BME model using Equation (6) with respect to (the moments of) q: For details, see MacKay (1995), Attias (2000), Ghahramani and Beal (2001) and Bishop (1999).
Because in the Bayesian formulation of the model given in Equation (5), we useconditionalconjugatepriorsoverparametersofthecorresponding partition of u; in the factorization assumed for qðuÞ in Equation (7), all the optimal distributions (Equation 3) of the VBME model of each variational factor, q j ðu j Þ; belong to the same family as the full conditional distributions for the BME model. Because the same Bayesian model (BME) was converted to its variational counterpart (VBME), both models are comparable. The derivation details of all these densities are given in Appendix A.
Algorithm for implementing the VBME Initialize: m qðbÞ ¼ 0; m qðb1Þ ¼ 0; m qðb2Þ ¼ 0; Σ qðb1Þ ¼ I J ; Σ qðb2Þ ¼ I IJ ; B qðs 2 b Þ ; B qðabÞ ; B qðs 2 1 Þ ; B qða b1 Þ ; B qðs 2 2 Þ ; B qða b2 Þ ; B qðs 2 e Þ ; B qðaeÞ . 0:Cycle: þ n e m qðs 22 e Þ until the change in the lower bound, log p ðy; qÞ; between iterations is less than a tolerance value specified a priori. For our choice of approximating distribution (see Appendix B), in each cycle of the algorithm, logðp ðy; qÞÞ has the following analytical expression: To obtain (8), several substitutions were made for simplification; it is valid only for the VBME model proposed. In this algorithm, the updates of the parameters should be as given in the order proposed for the algorithm.

Model implementation
Recall that the hyperparameters of the prior distributions are values given by the user in the Bayesian paradigm. In this case, the hyperparameters used for the BME model given in the previous section were: All these hyperparameters were chosen to lead weakly informative priors. The implementation of both models (BME and VBME) was done in the R statistical software (R Core Team 2016). The full conditional and posterior distributions given in Appendix A were used for the BME and VBME models, respectively. Also, for the VBME model, an R code is given in Appendix D. The BME model was implemented with the MCMC method with 40,000 iterations with a burn-in of 20,000 and a thinning of five; therefore, 4000 samples were used for inference.

Assessing prediction accuracy
Prediction accuracy was assessed using 20 training (trn)-testing (tst) random partitions; we used this approach because one can obtain as many partitions as one needs with a replicated trn-tst design. The implemented cross-validation (CV1) mimics a situation where lines were evaluated in some environments for the trait of interest, but some lines are missing from all the other environments. We assigned 80% of the observations to the trn set, and the remaining 20% to the tst set. Of the variety of methods for comparing the predictive posterior distribution to the observed data (generally termed "posterior predictive checks"), we used the Pearson correlation. Models with higher correlation values indicate better predictions. Under the BME, the predicted observations were calculated asŶ ¼ Xb þ Z 1b1 ; þZ 2b2 whereb;b 1 and b 2 ; are estimates of b; b 1 ; and b 2 ; these estimates correspond to the posterior mean of s collected Gibbs samplers to estimate the predicted values after discarding those used for the burning period, while under the VBME,Ŷ ¼ Xm q ðbÞ þ Z 1 m qðb1Þ þ Z 2 m qðb2Þ ; as described in the algorithm for implementing the VBME.

Experimental data sets
Here, we present the information on the data sets used for implementing the proposed models. A total of eight data sets were used (three maize and five wheat).
Maize data sets 1-3: A total of 309 doubled haploid maize lines was phenotyped and genotyped; this is part of the data set used by Crossa et al. (2013) and Montesinos-López et al. (2016), which comprised a total of 504 doubled haploid lines derived by crossing and backcrossing eight inbred lines to form several full-sib families. Traits available in this data set include grain yield (Maize_GY), anthesis-silking interval (Maize_ASI), and plant height (Maize_PH); each of these traits was evaluated in three optimum rainfed environments (Env1, Env2, and Env3). The experimental field design used in each of the three environments was an alpha lattice incomplete block design with two replicates. Data were preadjusted using estimates of block and environmental effects derived from a linear model that accounted for the incomplete block design within environment, and for environmental effects.
The genomic data were obtained through genotyping-by-sequencing (GBS) for each maize chromosome; the number of markers after initial filtering and the number of markers after imputation were summarized in Crossa et al. (2013). Filtering was first done by removing markers that had .80% of the maize lines with missing values, and then markers with minor allele frequency #0.05 were deleted. The total numbers of GBS data were 681,257 single nucleotide polymorphisms (SNPs), and, after filtering for missing values and minor allele frequency, 158,281 SNPs were used for the analyses. About 20% of cells were missing in the filtered GBS information used for prediction; these missing values were replaced by their expected values before doing the prediction.
Wheat data sets 4-5: The first two wheat data sets came from a total of 250 wheat lines that were extracted from a large set of 39 yield trials grown during the 2013-2014 crop season in Ciudad Obregon, Sonora, Mexico (Rutkoski et al. 2016). The trials were sown in mid-November and grown on beds with five and two irrigations plus drip irrigation. Days to heading (Wheat_DTHD) were recorded as the number of days from germination until 50% of spikes had emerged in each plot, in the first replicate of each trial, while plant height (Wheat_PTHT) was recorded in centimeters.
For the first two wheat data sets, GBS was used for genome-wide genotyping. Single nucleotide polymorphisms were called across all lines using the TASSEL GBS pipeline anchored to the genome assembly of Chinese Spring. Single nucleotide polymorphisms were extracted and markers were filtered so that percent missing data did not exceed 80 and 20%, respectively. Individuals with .80% missing marker data were removed, and markers were recorded as 21, 0 and 1, which indicate homozygous for the minor allele, heterozygous, and homozygous for the major allele, respectively. Next, markers with ,0.01 minor allele frequency were removed, and missing data were imputed with the marker mean. A total of 12,083 markers remained after marker editing.
Wheat data set 6: This data set, from CIMMYT's Global Wheat Program, was used by Crossa et al. (2010) and Cuevas et al. (2016a,b) and includes 599 wheat lines derived from 25 yr  of Elite Spring Wheat Yield Trials (ESWYT). The environments represented in these trials are four agroclimatic regions (mega-environments). The phenotypic trait considered here was grain yield (Wheat_Yield1) of the 599 wheat lines evaluated in each of the four mega-environments. The 599 wheat lines were genotyped using 1447 Diversity Array Technology (DArT) markers generated by Triticarte Pty. Ltd. (Canberra, Australia; http://www.triticarte.com.au). Markers with a minor allele frequency (MAF) (0.05) were removed, and missing genotypes were imputed using samples from the marginal distribution of marker genotypes. The number of DArT markers after edition was 1279.
Wheat data sets 7-8: These two data sets were described and used by Lopez-Cruz et al. (2015) and Cuevas et al. (2016b) for proposing a marker·environment interaction model. The phenotypic data consisted of adjusted grain yield (tons/hectare) records collected during three evaluation cycles of different inbred lines evaluated in different environments, and denoted as Wheat_Yield2 and Wheat_Yield3. The environments were three irrigation regimes (moderate drought stress, optimal irrigation, and drought stress), two planting systems (bed and flat planting), and two different planting dates (normal and late). All trials were planted using a lattice design with three replicates in each environment at CIMMYT's main wheat breeding station in Cd. Obregon, México. The phenotype used in the analysis was the Best Linear Unbiased Estimate (BLUE) of grain yield obtained from a linear model applied to the alpha-lattice design of each cycle-environment combination. The trait Wheat_Yield2 had 693 wheat lines evaluated in four environments, and the trait Wheat_Yield3 included 670 wheat lines evaluated in four environments. Genotypes were derived using GBS technology, and markers with a MAF of 0.05 were removed. All markers had a high incidence of uncalled genotypes, so we applied thresholds for incidence of missing values, and focused on maintaining relatively large and similar numbers of markers per data set. After editing the missing markers, we had a total of 15,744 GBS markers for analyzing traits Wheat_Yield2 and Wheat_Yield3.

Data availability
Phenotypic and genotypic data on the eight data sets included in this study can be downloaded from http://hdl.handle.net/11529/10907.

RESULTS
The results of this research are given in two sections. The first section compares the parameter estimates and implementation time under both models (BME and VBME). The second section compares the prediction accuracy of both models in each of the eight data sets under study.
Comparing parameter estimates and the implementation time of models BME and VBME Table 1 gives the parameter estimates under both models (BME and VBME) of the b coefficients, variance components (s 2 1 ; s 2 2 and s 2 e ), Pearson correlation between the observed and predicted phenotypes of the full data sets, and the time in minutes required to implement each model on the data sets under study. In general, the beta coefficient estimates obtained under the BME and VBME models are very similar, but there are substantial differences in the estimates of variance components, given that a little more than half of them were lower under the VBME model as compared to those estimates obtained under the BME. Also, in seven out of eight data sets, the Pearson correlation between observed and predicted genotypes was larger under the BME, while in seven out of eight data sets, the implementation time was $10 times lower under the VBME than under BME. Table 2 shows details of the comparison between the proposed VBME model vs. the BME model. The smallest difference between the beta coefficients was observed in b 1 for data set Wheat_Yield1 (1%), while the largest difference was observed in b 3 for data set Wheat_Yield2 (25.3%); the estimates of the BME model were lower. With regard to the variance components, Table 2 shows that the smallest difference was observed in s 2 1 for Wheat_Yield3 (3.8%), while the largest difference was also in s 2 1 for data set Maize_PH, where the estimated variance component of the VBME was 5.68 times higher than the estimated variance component of the BME model.
In terms of the Pearson correlation, in seven out of eight data sets, the best correlations were observed in the BME model; the minimum difference (1.6%) was observed in data set Wheat_Yield3, while the maximum difference (15.4%) was observed in Wheat_PTHT. Finally, in terms of implementation time (in minutes), in seven out of eight data sets the VBME model was faster than the BME model, with a minimum difference of 16.9%, and a maximum difference of 98.2%. The computer used to implement both models for each of the eight data sets (reported in Table 1 and Table 2) was a Windows operating system with processor Intel(R) Core(TM) i7-4610M CPU @ 3.0 GHz under a 64-bit operating system.
Comparing the prediction ability of models BME and VBME Table 3 presents prediction accuracies for both models resulting from the 20 trn-tst random partitions implemented for each of the eight real n Table 1 Parameter estimates, implementation time and Pearson correlation between the observed and predicted values for the complete data sets under the BME and VBME models The implementation time is given in minutes. Mean and SD under the BME were obtained as the Mean and SD posteriors, and, under the VBME, the Mean and SD were obtained as the mean and SD of the variational posterior.
n Table 2 Relative comparison using the full data sets between the BME and VBME calculated as parameter estimate under BME minus parameter estimate under VBME divided by parameter estimate under BME data sets under study. Table 3 gives the prediction accuracies for each environment in each data set. In general, the prediction accuracies of the BME are slightly higher than those of the VBME model. However, it is interesting to point out that, even under the VBME, the prediction accuracies are comparable to those found under the BME. Table 4 shows that in five out of eight data sets the predictions under the BME model were the best. Also, the lowest difference in the five data sets in which the BME model was the best was 4.2% in favor of the BME in Env1 in data set Wheat_Yield2, while the largest difference was 52.9% in favor of the BME in Env4 in data set Wheat-Yield1. For the three data sets in which the VBME was the best, the smallest difference was 1.2% in Env2 in data set Wheat_DTHD, and the largest difference was 138.1% in Env1 in data set Wheat_Yield1.

DISCUSSION
We proposed and implemented the variational Bayes version of the BME model. We found that the b coefficients were estimated in a similar way in both models (BME and VBME), but the variance components were considerably underestimated in the VBME model. However, in terms of prediction accuracy, the BME was a little better than the VBME model. The VBME model is a quick and a good approximation to the BME model. These results were expected, since, under MCMC methods, we estimate the parameters using the mean posterior from samples of the target distribution (posterior distribution), while, in the second approach for estimating the parameters, we also use the mean, but from the approximated target distribution, here obtained under the VBM and called variational posterior distribution. The above is documented in some papers that reported that the VBM showed less accuracy than its MCMC counterpart. This is due to the fact that variational Bayes methods approximate the target posterior distribution by a factorized distribution that has minimum KL distance to the target posterior. The factorized distribution we used was the mean field variational family restriction, which makes strong independent assumptions to achieve great flexibility and scalable optimization at the price of low accuracy and underestimation of posterior variances (Blei et al. 2016). In contrast, MCMC methods such as the BME model have a high computational cost with the advantage that they generate samples directly from the target posterior distribution that are guaranteed to converge asymptotically to the true posterior distribution. The new proposed VBME model for univariate phenotypes with G·E in-teraction is a step ahead of the models proposed until now, in the context of genomic-enabled prediction models proposed by other authors and described in the Introduction. In this paper, we propose a genomic VBM model using a half-t priors distribution, and give clear theoretical details of the derivation of the full conditional and posterior distributions, and account for G·E interaction.
Advances in genetic engineering allow collecting and storing millions of markers (SNPs, etc.) from plant and animal genomes, and many breeding programs are interested in improving many correlated traits simultaneously. For these reasons, in genomic selection there is a great need to use genome-enabled prediction models that take into account all available markers, traits, and environments under study to improve the prediction of genotypes early in time for multiple correlated traits. However, due to the large amount of input information for Bayesian methods (such as those used in the BME model), these statistical models and methods might become intractable, or impose n Table 3 Prediction accuracy of the BME and VBME models resulting from the 20 trn-tst random partitions implemented n Table 4 Relative comparison of the prediction accuracies of the BME and VBME models calculated as APC under BME minus APC VBME divided by the APC under BME The smallest and largest differences in terms of prediction accuracy for both models are in boldface. In the bottom half of this table, 1 means that the best model was BME and 0 means that the best model was VBME in terms of prediction accuracy.
a very high computational cost in terms of time. For these reasons, and because it offers a significant improvement in computational speed at the cost of a small loss in prediction accuracy, we believe that the proposed VBME model is an attractive alternative to the BME model.

Advantages and disadvantages of the variational model (VBME)
As already mentioned, the variational model (VBME) was proposed due to the need to improve the speed of the BME. For this reason, we propose the iterative algorithm given in the section Algorithm for implementing the VBME, which approximates the BME model in a reasonable way. Results show that the VBME method: (1) is sometimes more flexible than its Bayesian counterpart; (2) offers a reasonable approximation to the posterior distributions of all model parameters; (3) most of the time is much less demanding in computational time than the BME built using MCMC techniques; and (4) maintains most of the advantages of Bayesian inference. However, the VBME has some limitations: (1) the approximation cannot become more accurate by increasing computational time; (2) part of the VBME's advantage in computing time is because the model is simple; however, this algorithm needs to be adapted to cases that are different from the one used here; (3) the VBME, like all variational Bayes methods, does not guarantee that it will converge to a global maximum; (4) the convergence time depends strongly on the level of tolerance desired and the nature of the data; and (5) it has been documented, and was observed here, that variational models underestimate some covariance parameters. However, the time reduction depends strongly on the tolerance level: the lower the value, the longer the convergence time. For this reason, although our results show a significant reduction in time, this gain can be reduced substantially if the chosen tolerance level is smaller. Our empirical results when implementing the VBME in eight real data sets show that a tolerance level of 0.00001 is enough, because when we increased it to 1E28, no significant gain in prediction accuracy or accuracy in parameter estimates was observed. Also, since in developing our VBME, we used the mean field variational family restriction that approximates qðuÞ by the product of q latent independent variables to gain tractability, this may underestimate the variability of parameter estimates when there is a considerable degree of dependence. Another important disadvantage of the VBM is that deriving the set of equations (variational posterior and lower bound) used to iteratively update the parameters often requires a large amount of algebra compared with deriving the comparable Gibbs sampling equations. This is evident in Appendix A for our proposed VBME model, which, despite being a simple model, required a lot of algebra and notation to obtain the variational posterior distribution and the corresponding lower bound. This amount of algebra needs to be done each time for every specific model. To take advantage of the proposed model, we provide some guidance on when to use the VBME or the BME. The BME (an MCMC method) tends to be more computationally intense than the VBME (a variational method) but guarantees that exact samples from the target density are (asymptotically) produced (Robert 2004). While the VBME does not guarantee exact samples, only approximate ones (that is, it can only find a density close to the target), most of the time it is faster than MCMC methods. Since the VBME is based on optimization, it can more easily take advantages of stochastic optimization and distributed optimization. For these reasons, the VBME is expected to be better suited to larger data sets than the BME and for scenarios where we want to quickly explore many models. Also, Blei et al. (2016) point out that the VBME is preferred when the posterior of a mixture model admits multiple modes, since MCMC techniques are not an option for this scenario, even with small data sets.
Finally, we believe that the VBME method, as proposed here, is a useful addition to the existing arsenal of genome-enabled prediction models; it is also an alternative to the BME model, since it considerably reduces implementation time. Additionally, this research opens a new branch of methodologies for fitting genomic models to genomic data that need novel methods to take advantage of the huge amount of data available nowadays. For these reasons, we agree with Ormerod and Wand (2010), who pointed out that the usefulness of variational approximations increases as the size of the problem increases, and Monte Carlo methods such as MCMC start to become untenable. We also believe that our research contributes significantly to researchers' understanding of the VBM, because, in developing the proposed VBME, we provided all the details of the derivation of the variational posterior distributions needed to derive the q-specific evidence lower bound for building the algorithm, and the R scripts for implementing the proposed VBME model. This information gives the scientist the basic elements needed to develop new models under the variational Bayes approach. It is also important to point out that we provide all the details for deriving the full conditionals needed for implementing the BME model since the proposed model uses Half-t priors that produce noninformative variance components (Huang and Wand 2013).

Conclusions
In this paper, we propose a new alternative to the BME via variational Bayes, which we called the VBME model. The proposed method underestimated the variance components and similar estimates of the beta coefficients were found; in terms of prediction accuracy, the VBME had lower prediction accuracies than the BME. However, according to our results, the loss in prediction accuracy is compensated for by the significant gain in implementation time, since the VBME is much faster than the BME. However, it is important to point out that if a very low tolerance value is selected, the gain in terms of implementation speed will vanish. For this reason, to take advantage of the proposed VBME, a reasonable tolerance value should be chosen that guarantees the quality of the parameter estimates and predictions, as well as a significant reduction in implementation time.
We also believe that the proposed model should be tested on many real data sets to obtain more empirical evidence of its performance; this evidence would allow us to extend the proposed variational Bayes model to take into account nonnormal responses (binary, Poisson, negative binomial, etc.), for multiple traits and multiple-environment models (Montesinos-López et al. 2016), which would enable scientists to make a more precise selection of candidate genotypes at early crop stages. Also, models that avoid the product density restriction can be proposed. Finally, the proposed VBME model is a useful addition to the existing arsenal of genome-enabled prediction models, and an excellent alternative to the BME model, since it produces competitive predictions at a lower computational cost.