A Genomic Bayesian Multi-trait and Multi-environment Model

When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G × E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T × G × E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-t priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were insensitive to the choice of hyper-parameters. We also developed a computationally efficient Markov Chain Monte Carlo (MCMC) under the above priors, which allowed us to obtain all required full conditional distributions of the parameters leading to an exact Gibbs sampling for the posterior distribution. We used two real data sets to implement and evaluate the proposed Bayesian method and found that when the correlation between traits was high (>0.5), the proposed model (with unstructured variance–covariance) improved prediction accuracy compared to the model with diagonal and standard variance–covariance structures. The R-software package Bayesian Multi-Trait and Multi-Environment (BMTME) offers optimized C++ routines to efficiently perform the analyses.

Since the whole-genome prediction (WGP) model of Meuwissen et al. (2001), practical results have shown that genomic selection (GS) using Bayesian and non-Bayesian linear regression models improves prediction accuracy compared to conventional and pedigree selection (de los Campos et al. 2009Crossa et al. 2010Crossa et al. , 2011Heslot et al. 2012;Pérez-Rodríguez et al. 2012). With GS, genomic breeding values are estimated as the sum of marker effects for genotyped individuals in the testing or prediction population. The marker effects are estimated simultaneously using a training population that contains phenotyped and genotyped individuals.
In plant breeding, most of the available methods for WGP are useful for analyzing a single trait measured either in a single environment or in multi-environments with the incorporation of genotype · environment interaction (G · E) (Burgueño et al. 2012;Heslot et al. 2014;Jarquín et al. 2014, Montesinos-López et al. 2015, López-Cruz et al. 2015. However, researchers often face situations in which multiple traits are measured across multiple environments. For example, crop breeders record phenotypic data for multiple traits such as grain yield and its components (e.g., grain type, grain weight, biomass, etc.), grain quality (e.g., taste, shape, color, nutrient content), and tolerance to biotic and abiotic stresses. They often aim to improve all these multiple correlated traits simultaneously or to predict the ones that are difficult to measure with those that are easy to measure. However, it is common practice to perform an independent analysis and genomic prediction on a single phenotypic trait.
The advantage of jointly modeling multiple traits compared to ana-et al. 1984;Schaeffer 1984). In the context of WGP, Jia and Jannink (2012), Guo et al. (2014), and Jiang et al. (2015) found that joint prediction of multiple traits benefits from genetic correlation between traits and significantly improves prediction accuracy compared to single trait methods, specifically for low-heritability traits that are genetically correlated with a high-heritability trait. Jia and Jannink (2012) also found better prediction accuracy for multiple traits than for single traits when phenotypes are not available for all individuals and traits. Therefore, there is evidence that multiple trait analysis is useful to predict yet-to-be observed phenotypes in plant and animal breeding when selecting unphenotyped candidates early through the prediction of their genomic breeding values. Multi-trait analysis has also been found to substantially increase prediction accuracy when some traits are observed in all individuals but the trait of interest is not observed in the individuals in the test set (Pszczola et al. 2013;Rutkoski et al. 2016). Multivariate analysis of continuous outcomes is well established in statistical literature (Johnson and Wichern 1992). However, the available methods cannot be applied in a straightforward manner for WGP, since the number of independent variables (p) is usually larger than the available sample size (n). The genomic best linear unbiased predictor (GBLUP) WGP model can be implemented in standard software for multiple traits and multiple environments by taking into account twoway interaction terms and estimating separable unstructured covariance matrices of the form A5B (where A and B are the corresponding covariance matrices of factors A and B, respectively). However, these software programs are unable to estimate separable unstructured variance-covariance matrices of the form A5B5C for three-way interaction terms. For this reason, in this situation, at least one of the variance-covariance components is assumed to be identity or a new variable is created by merging two factors and estimating a covariance matrix with only two components as A5B Ã , where B Ã contains the variance-covariance of two factors, but each component cannot be separated. Also, univariate Bayesian inference has been proposed and extensively implemented in WGP models (Gianola 2013). The Bayesian alphabet methods (Bayes A, Bayes C, and Bayes Cp) have been extended for multiple trait analysis (de los Campos and Gianola 2007;Calus and Veerkamp 2011;Jia and Jannink 2012;Guo et al. 2014) and most recently, Jiang et al. (2015) proposed a Bayesian multivariate antedependence model.
Despite evidence of the increased prediction accuracy of WGP models incorporating G · E (Burgueño et al. 2012;Jarquín et al. 2014, Montesinos-López et al. 2015López-Cruz et al. 2015) and of WGP models for multi-trait data, statistical models for analyzing continuous data for simultaneously assessing multi-traits and multi-environments are lacking. Thus, the integration of these two approaches in one unified WGP model is required (Jiang et al. 2015). This unified WGP model would be useful in two cases: (i) when individuals are measured for all traits in one environment, but only some traits in other environments; and (ii) when some traits are recorded in only a subset of individuals in all environments. This model would be useful not only in plant breeding but also in animal breeding, where genetic evaluation of many traits is performed on a weekly basis by many breeding programs globally. It is also possible to integrate other advantageous strategies such as the antedependence model to incorporate dominant and epistatic effects.
All the Bayesian methods developed so far for multiple trait analysis use the Inverse-Wishart (IW) conjugate family of distributions as priors for the covariance matrices between traits. However, Gelman (2006) and Huang and Wand (2013) argued against using IW priors for covariance matrices because they impose a degree of informativity and the posterior inferences are sensitive to the choice of hyper-parameters. Recently, Huang and Wand (2013) proposed a scale mixture approach involving an IW distribution and independent Inverse-Gamma (IG) distributions for each dimension as priors for the covariance matrix parameters. The ensuing covariance matrix distribution is such that all standard deviation parameters have Half-t distributions and the correlation parameters have uniform distributions on (21,1) for a particular choice of the IW shape parameter. The advantage of this approach is that it is possible to choose shape and scale parameters that achieve arbitrary high noninformativity of all standard deviations and correlation parameters (Huang and Wand 2013). However, the model proposed by Huang and Wand (2013) is a standard mixed model with correlated errors that does not include interaction terms of any kind and does not consider three-way interaction.
In this study, we propose a Bayesian method that integrates the analysis of multi-traits and multi-environments and takes into account trait · genotype · environment interaction (T · G · E) in a unified WGP model. We used Half-t priors on each standard deviation term and uniform priors on each correlation to achieve high noninformativity and posterior inferences that are not sensitive to the choice of hyper-parameters. We illustrate the use of the unified Bayesian Multi-Trait and Multi-Environment (BMTME) method in simulated data sets and two real data sets (one maize and one wheat) including multiple traits measured on wheat and maize lines evaluated in multiple environments and genotyped with dense molecular markers. We also provide an R package called BMTME that can be used to fit the proposed methods.

Statistical model
We use y ðlÞ ijk to represent the normal response from the kth replication of the jth line in the ith environment for the lth trait (i ¼ 1; 2; . . . ; I, j ¼ 1; 2; . . . ; J, k ¼ 1; 2; . . . ; K, l ¼ 1; . . . ; LÞ; where K represents the number of replicates of each line in each environment and L denotes the number of traits under study. To present the theory in a simple manner, we will use I ¼ 3 and L ¼ 3: Therefore, the total number of observations for the lth trait is n ¼ I · J · K: We propose the following linear mixed model for each trait: where E ðlÞ i represents the ith environment for the lth trait and is assumed as a fixed effect, g ðlÞ j represents the genomic effect of jth line in the lth trait and is assumed as random effect, gE ðlÞ ij is the interaction between the genomic effect of the jth line and the ith environment for the lth trait and is assumed a random effect, and e ðlÞ ijk is a random error term associated with the kth replication of the jth line in the ith environment for the lth trait. To take into account the correlation between traits, one could use the following L variate linear mixed model: ijk ; . . . ; y i T $ N L ð0; S t Þ; S t is the genetic covariance matrix between traits and is  (1), we can perform a separate analysis for each trait, with the inconvenience that independence between the L traits is assumed. Model (2) can take into account and exploit the correlation between traits.
In matrix notation, the model given in equation (2) including all the information is expressed as: where Y is of order Ln 3 1, X is of order Ln 3 IL, b is of order IL 3 1, Z 1 is of order Ln · LJ, b 1 is of order LJ · 1, Z 2 is of order Ln · IJL, b 2 is of order IJL · 1 , and e is of order Ln 3 1.
where S E is assumed a diagonal matrix of order I · I, which indicates that we are assuming independence between environments. It is important to point out that the trait · environment (T · E) interaction term is included in the fixed effect b, while the trait · genotype (T · G) interaction term is included in the random effect b 1 and the three-way (T · G · E) interaction term is included in b 2 . The errors are assumed to be correlated with the covariance defined as R ¼ I n 5R e . More flexible variance-covariances as diagonal or identity are straightforward. Also note that G g is of order J · J; therefore, G 1 is of order JL · JL and G 2 is of order IJL · IJL. The matrix of the genomic relationship between lines G g , also known as Genomic Relationship Matrix (GRM), was calculated using the method of VanRaden (2008).
The notation V $ Inverse-Wishartðk; BÞ indicates that the density both are positive definite matrices. We assume that b denotes an Inverse-Wishart distribution with shape n b and scale note an IG distribution with shape 1=2 and scale parameters 1=A 2 b .
b 1 S t $ N JL ð0;G 1 Þ, S t a 1; ...;a L $ IW n t þ L21;2n t diagð 1 a1 ;... 1 aL Þ , Next we combine the joint posterior density of the parameter vector (4) with the priors to obtain the full conditional distribution for parameters b, s 2 b , a b , b 1 , b 2 , S t ; a; R e ; a e . All full conditionals, as well as details of their derivations, are given in Appendix A.

Gibbs sampler
In order to produce posterior means for all relevant model parameters, below we outline the exact Gibbs sampler procedure that we propose for estimating the parameters of interest. As is the case with Markov Chain Monte Carlo (MCMC) techniques, the ordering of draws is somewhat arbitrary; however, we suggest the following order: Step 1. Simulate b according to the normal distribution given in Appendix A (A.1).
Step 2. Simulate s 2 b according to the IW distribution given in Appendix A (A.2).
Step 3. Simulate a b according to the IG distribution given in Appendix A (A.3).
Step 4. Simulate b h for h ¼ 1; 2; according to the normal distribution given in Appendix A (A.4 and A.5).
Step 5. Simulate S t according to the IW distribution given in Appendix A (A.6).
Step 9. Simulate R e according to the IW distribution given in Appendix A (A.10).
Step 11. Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics.

Model implementation
The Gibbs sampler described above for the BMTME model was implemented as an R-software package. We performed a total of 60,000 iterations; 30,000 samples were used for inference because the first 30,000 were used as burn-in to decrease the MCMC errors in prediction accuracy. We did not apply thinning of the chains following the suggestions of Geyer (1992), MacEachern and Berliner (1994), and Link and Eaton (2012), who provide justification of the ban on subsampling MCMC output for approximating simple features of the target distribution (e.g., means, variances, and percentiles). We implemented the prior specification given in the previous section where the BMTME model was defined. The hyper-parameters used were for b . . . ; a L we used n t ¼ 2, for a l we used A l ¼ 100; 000; for l ¼ 1; 2; . . . ; L, for b 2 S t ; S E $ N IJL ð0; G 2 Þ, for s 2 Ei ja Ei we used n Ei ¼ 2 for a Ei we used A Ei ¼ 100; 000 for i ¼ 1; 2; . . . ; I; for R e a e1; . . . ; a eL we used n e ¼ 2; for a el we used A el ¼ 100; 000 for l ¼ 1; ::; L. All these hyper-parameters were chosen to lead weakly informative priors.

Assessing prediction accuracy
We used two cross-validation schemes for generating training and validation sets that mimic two real situations a breeder might face. Cross-validation 1 (CV1) mimics a situation where lines were evaluated in some environments for all traits but some lines are missing in other environments; this is similar to cross-validation 2 of Burgueño et al. (2012). The other cross-validation scheme is CV2, which mimics a situation where a trait is lacking in all lines in one environment but present in the remaining environments (see Table D1 in Appendix D). In this case, information from relatives is used, and prediction assessment can benefit from borrowing information between lines across environments, and among correlated traits. We implemented a 10-fold cross-validation with 80% of the observations in the training set and 20% in the testing set. Of the variety of methods for comparing the predictive posterior distribution to the n Mean and standard error (SE) of the estimated correlations and Mean Squared Prediction Error (MSPE) from the 10-fold cross-validation CV1. The BMTME model was fitted using unstructured (U), diagonal (D), and standard (S) variance-covariance matrix. Environment (E1, E2, E3)-trait (T1, T2, T3) combination. Method stands for the variance-covariance matrix used with the BMTME, E-T for the environment-trait combination, R for rank, and Ave for average. a Since three conditions are compared (unstructured, diagonal, and standard), the values of the ranks range from 1 to 3, and the lower the values, the better the prediction accuracy. For ties, we assigned the average of the ranks that would have been assigned had there been no ties.
observed data (generally termed "posterior predictive checks"), we used two criteria: the mean square error of prediction (MSEP) and the Pearson correlation. Models with small MSEP indicate better predictions, and higher correlation values indicate better predictions. The predicted observations were calculated with S collected Gibbs samplers as:

Simulation data
To illustrate the parameter estimation of the proposed BMTME method, a small simulation experiment was conducted. The data were simulated based on model (3) with three environments, three traits, 80 genotypes, and 20 replications. We assumed that b T ¼ ½15, 8,7,12,6,7,14,9,8], where the first three b coefficients belong to traits 1, 2, and 3 in environment 1, the second three values for the three traits in environment 2 and the last three for environment 3, S t ¼ These two variance-covariance matrices gave rise to a matrix of correlation between traits with each correlation between pairs of traits equal to 0.85. Also, we assume that the GRM is known, G g ¼ 0:7I 80 þ 0:3J 80 , where I 80 is an identity matrix of order 80 and J 80 is a matrix of order 80 · 80 of ones. The relationship between environments is assumed as S E ¼ diagð0:65; 0:55; 0:75Þ. Therefore, the total number of observations was 3 · 80 · 3 · 20 ¼ 14400; i.e., 4800 for each trait. With these parameters, 50 data sets were simulated according to model (3) and for each data set, parameters b T , S t , S E , and R e were estimated with the BMTME model using the Gibbs sampler given above. We used the priors given in the section on model implementation, which were also used for the applications with real data sets. For this simulated data set, we computed 20,000 MCMC samples, and Bayes estimates were computed with 10,000 samples, since the first 10,000 were discarded as burn-in. In Table 1, we report average estimates along with standard deviations (SD). Also, with the proposed BMTME model, we simulated two data sets similar to the simulation study explained above, except that the environmental covariance matrix we used was an identity matrix. The first data set assumes that the genetic and residual correlation between traits was 0.85 for all pairs of traits under study, while the second data set assumes that the correlation between all pairs of traits was 0.2 for both covariance matrices (S t and R e ). We implemented a 10-fold crossvalidation (CV1). The training data set has 80% of the lines (64 lines), while the testing data set has the remaining 20% (16 lines). We assessed the prediction performance using the simulated data set under three conditions: (1) unstructured (Appendix A): assuming both variancecovariances are unstructured (S t and R e ); (2) diagonal (Appendix B): assuming both variance-covariances are diagonal; and (3) standard (Appendix C): assuming both variance-covariances are identity multiplied by the scale parameters s 2 t and s 2 e , respectively.

Real data sets
Maize data set: A total of 309 double-haploid maize lines were phenotyped and genotyped; this is part of the data set used by Crossa et al. (2013) that comprised a total of 504 doubled haploid lines derived by crossing and backcrossing eight inbred lines to form several full-sib families. Traits available in this data set include grain n Posterior mean and SD of the b coefficients (b) for three traits, Yield, anthesis-silking interval (ASI), and plant height (PH) in three environments (E1, E2, and E3). Estimate variance-covariance components for the traits (Ŝ t ), the environments (Ŝ E ), and the residuals (R e ). InŜ t andR e , the upper triangle contains the variancecovariance components and the lower triangle contains the correlations.Ŝ E is a diagonal matrix.
yield (Yield), anthesis-silking interval (ASI), and plant height (PH); each of these traits was evaluated in three optimum rainfed environments (E1, E2, and E3). The experimental field design in each of the three environments was an a-lattice incomplete block design with two replicates. Data were preadjusted using estimates of block and environmental effects derived from a linear model that accounted for the incomplete block design within environment and for environmental effects. Information about genotyping-by-sequencing (GBS) data for each maize chromosome, the number of markers after initial filtering, and the number of markers after imputation, was summarized in Crossa et al. (2013). Filtering was first done by removing markers that had .80% of the maize lines with missing values, and then markers with minor allele frequency lower than or equal to 0.05 were deleted. The total number of GBS data was 681,257 single nucleotide polymorphisms (SNPs) and, after filtering for missing values and minor allele frequency, 158,281 SNPs were used for the analyses. About 20% of cells were missing in the filtered GBS information used for prediction; these missing values were replaced by their expected values before doing the prediction.
Wheat data set: A total of 250 wheat lines were extracted from a large set of 39 yield trials grown during the 2013-2014 crop season in Ciudad Obregon, Sonora, Mexico (Rutkoski et al. 2016). The trials were sown in mid-November and grown on beds with 5 and 2 irrigations plus drip irrigation. Days to heading (DTHD) were recorded as the number of days from germination until 50% of spikes had emerged in each plot, in the first replicate of each trial. Grain yield (GRYLD) was the total plot grain yield measured after maturity, and plant height (PTHT) was recorded in centimeters.
Image data of the yield trials were collected using a hyper-spectral camera (A-series, Mirco-Hyperspec VNIR, Headwall Photonics, Fitchburg, Massachusetts) mounted on a manned aircraft. From this data, vegetative indices for each plot were calculated. The green normalized difference vegetation index (GNDVI) was one of the traits used in this study. Trait GNDVI is considered a good predictor when used with pedigree and/or genomic prediction of GRYLD in wheat due to its high heritability and genetic correlation with GRYLD. Also, trait GNDVI can be measured remotely in large numbers of candidates for selection.
Genotyping-by-sequencing was used for genome-wide genotyping. Single nucleotide polymorphisms were called across all lines using the TASSEL GBS pipeline anchored to the genome assembly of Chinese Spring. Single nucleotide polymorphism calls were extracted and markers were filtered so that percent missing data did not exceed 80% and 20%, respectively. Individuals with .80% missing marker data were removed, and markers were recorded as 21, 0, and 1, indicating homozygous for the minor allele, heterozygous, and homozygous for the major allele, respectively. Next, markers with ,0.01 minor allele frequency were removed, and missing data were imputed with the marker mean. A total of 12,083 markers remained after marker editing.
n Mean and SE of the estimated correlations and MSPE from the 10-fold cross-validation CV1. The BMTME model was fitted using unstructured, diagonal, and standard variance-covariance matrices. Environment (E1, E2, E3)-trait (Yield, ASI, PH) combination. a Since three BMTME models are fitted (unstructured, diagonal, and standard) the values of the ranks ranged from 1 to 3, and the lower the values, the better the prediction accuracy. For ties, we assigned the average of the ranks that would have been assigned had there been no ties.

Data and codes repository
The phenotypic and genotypic information of the two data sets included in this study as well as the R package for performing the analyses can be downloaded from the link: http://hdl.handle.net/11529/10646. This link contains the phenotypic data on maize (Data.maize) and wheat (Data.trigo), as well as genomic data on maize (G.maize) and wheat (G.trigo). Also, the link includes the BMTME.zip with the R package used to perform the analyses under the BMTME model.
The BMTME R package Nowadays the R programming language is a popular tool in statistical science for analyzing and visualizing data (R Core Team, 2015). However, in the context of big data with complex models, the speed of R is slow. For this reason, many times R is combined with C++ codes to produce high-performance programs that considerably increase the speed of programs (Stroustrup 2000;Eddelbuettel and Sanderson 2014). The R package we developed for fitting the BMTME models merges R and C++ through the use of Rcpp together with Armadillo C++ library (Sanderson 2010;Eddelbuettel 2013). Appendix E describes how the three-way data should be arranged and Appendix F explains the basic input needed to run the routines built in the R package for fitting the BMTME.

Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

RESULTS
Results for the simulated data sets and for the real data sets (maize and wheat) are shown below. Table 1 gives the posterior mean and posterior SD for b coefficients (b T ) for each trait and for the variance-covariance matrices (S t ; R e ; S E ). The estimates of the posterior means for the b coefficients (b T ) and for the variance-covariance matrices (S t ; R e ) are very close to the true values, while the estimates of the diagonal covariance matrix (S E ) are slightly overestimated. Although the diagonal covariance matrix of S E is slightly overestimated according to the performed simulation study, we have evidence that the proposed BMTME model does reasonably well in terms of parameter estimation. We also tested the proposed BMTME model with another set of parameters and our results agree with the above mentioned results. Table 2 shows the resulting prediction accuracy (Correlation and MSEP) for each environment-trait combination for the two simulated data sets; we also present the ranking of the BMTME model under the three conditions (unstructured, diagonal, and standard) for each environment-trait combination. Based on the ranking given in Table  2, the best prediction accuracy for both data sets with low and high correlation between traits (using both criteria) was achieved when the model assumed an unstructured variance-covariance matrix for both S t and R e , followed by the second condition, which assumes a diagonal matrix for S t and R e in terms of MSEP, but for the standard condition in terms of the Pearson correlation. In terms of the Pearson correlation for both data sets (low and high correlation between traits), in five of the nine environment-trait combinations, the unstructured condition performed better in terms of prediction accuracy, while the standard condition performed better in three of nine environment-trait combinations, and the diagonal condition performed better in only one of nine.

Simulated data set
In terms of MSEP, the unstructured BMTME performed better in five (low correlation between traits) and six (high correlation between traits) of the nine environment-trait combinations, the diagonal BMTME in only four (low correlation between traits) and two (high correlation between traits) of nine combinations, and the standard BMTME in zero (low correlation between traits) and one (high correlation between traits) of nine combinations. Regarding the average of the nine groups (environment-trait combinations) for both prediction criteria (correlation and MSEP), the unstructured BMTME gave the best prediction, correlation = 0.57 (low correlation between traits), and correlation = 0.67 (high correlation between traits) and MSEP = 1.06 (low correlation between traits) and MSEP = 1.07 (high correlation between traits). In both data sets, the unstructured BMTME model had the best prediction accuracy; however, the higher the correlation between traits, the higher the prediction accuracies observed, since the average correlation between traits under the unstructured BMTME was 17.5% higher when the correlation between traits was 0.85 compared to when it was 0.2. Table 3 shows that for each trait there are moderate differences between the b coefficients between environments. For Yield and PH, the largest and smallest b coefficients were observed in environments E1 and E2, respectively, while for trait ASI, the largest b coefficient was observed in E3 and the smallest in E2. The genetic estimates of the variance-covariance components of traits are given inŜ t , where the correlation between traits is moderate. Yield and ASI have a negative correlation (20.27), and the correlation between ASI and PH is also negative (20.25), while the correlation between Yield and PH is 0.41. The same tendency is observed in the residual correlation between traits but with smaller correlation between traits. Table 4 shows the prediction accuracies (Correlation and MSEP) for each environment-trait combination and the ranking of the three conditions studied for each criterion in the maize testing data set for crossvalidation CV1. From the ranking, the best condition is the standard model, since it was the best in five of nine environment-trait combinations in terms of correlation, while in terms of MSEP, the diagonal model was the best in three of nine environment-trait combinations. As for the averages of the environment-trait combinations, the standard model was also the best in terms of both criteria. The second-best model was the diagonal, and the unstructured model was the worst in n terms of both criteria. This can be explained by the low correlation between traits that exists for this maize data set. Table 5 provides the results of cross-validation CV2 for the maize testing data set. The trait yield is unobserved in only one environment (for example, E1) for all lines, but data on the other two traits are available for this environment (E1), as well as for the other two environments (E2 and E3). The best model in terms of correlation and MSEP for predicting Yield for all lines in E1 was the standard model (0.215, 41.714), followed by the diagonal (0.168, 43.691) and the unstructured model (0.163, 46.407). In E2-Yield and E3-Yield, the best BMTME model was the unstructured model with Pearson correlation. In terms of MSEP, the BMTME unstructured model was the best model for predicting the Yield of the unobserved lines in E2, followed by the other two models (diagonal and standard). For E3-Yield, the best predictive model in terms of MSEP was the BMTME standard, followed by the unstructured model.

Maize data set
Wheat data set Table 6 shows that b coefficients are very different between traits and environments for the wheat data set. In environment Bed2IR, the largest b coefficients were observed in traits DTHD and GNDVI, respectively, while in environment Drip, the largest b coefficients were observed in traits GNDVI and GRYLD, respectively. The genetic estimates of the variance-covariance components of traits are given inŜ t , where the largest correlations were observed between trait DTHD vs. GNDVI, GRYLD, and PTHT; the same is true for the residual corre-lation between traits (R e ). In terms of prediction accuracies for the entire data set, they are high in terms of correlation and less precise in terms of MSEP mostly for trait PTHT in the three environments. Table 7 gives the prediction accuracy of the wheat data set for the testing data set for each environment-trait combination; it also gives the ranking of the three conditions studied under both criteria for cross-validation CV1. The best case is when the BMTME model assumes an unstructured variance-covariance matrix for both S t and R e and a diagonal matrix for the variance-covariance for S E ; followed by BMTME with a diagonal matrix for S t; R e , and S E : As for the ranking in terms of Pearson correlation, in 6 of 12 groups the BMTME unstructured model performed better in terms of prediction accuracy, while the BMTME diagonal model was the second-best model since it was the best model in 3 of 12 cases; the BMTME standard model was the worst model in terms of prediction accuracy since it was the best in only 1 of 12 cases. In terms of MSEP, the BMTME unstructured model performed better in 5 of 12 cases, the BMTME diagonal model was the best model in only 3 of 12 cases, and the BMTME standard model was the best in 1 of 12 cases. The BMTME unstructured model also had the best average prediction accuracy of the 12 groups. Table 8 gives the results of cross-validation CV2 which assumes that trait GRYLD is lacking in one environment for all lines but not in the other environments. The results given are only for the testing data set (trait GRYLD missing for all lines in one environment). The best model for predicting GRYLD for all lines in environment Bed2I with Pearson correlation was the BMTME unstructured model, followed by n Posterior mean and SD of the b coefficients (b) for four traits (DTHD, GNDVI, GRYLD, and PTHT) in three environments (Bed2I, Bed5I, and Drip). Estimated variancecovariance components for traits (Ŝ t ) and for residual (R e ). InŜ t , andR e the upper triangle contains the variance-covariance components and the lower triangle contains the correlations.
the BMTME standard and, in the last position, the BMTME diagonal model. In terms of MSEP, the results are exactly the opposite. While in environment Bed5I the best predictive model in terms of Pearson correlation was the BMTME standard, then the BMTME diagonal, and, in the last position, the BMTME unstructured model. In terms of MSEP, the best model was the unstructured model, then the standard, and, at the end, the diagonal model. In environment Drip, the ranking of models based on both criteria was as follows: BMTME unstructured, BMTME diagonal, and BMTME standard.

DISCUSSION
To our knowledge, this is the first statistical three-way genomic model for assessing the prediction accuracy of trait · genotype · environment.
Other models for assessing multi-traits or multi-environments have been extensively studied in the related literature (see, for example, Jarquín et al. 2014;Montesinos-López et al. 2015); however, none of them have simultaneously assessed and modeled the three-way variance-covariance structure. The BMTME model does this task simultaneously using Bayesian estimation and the package for performing such a task is given in this article.
Performance of the BMTME model in simulated and real data sets In the simulated data sets, the best prediction accuracies were achieved with the BMTME model (which assumes an unstructured variancecovariance matrix for the genetic and residual components) even when the correlation between traits was low, followed by the model that assumed a diagonal variance-covariance matrix for both matrices (in terms of the Pearson correlation) of traits, and then by the standard model, which was formed by an identity matrix multiplied by s 2 t and s 2 e for the genetic and residual variance-covariance matrices, n Mean and SE of the estimated correlations and MSPE from the 10-fold cross-validation CV1. The BMTME model was fitted using unstructured (U), diagonal (D), and standard (S) variance-covariance matrices. Environment (Bed2I, Bed5I, Drip)-trait [days to heading (DTHD), GNDVI, grain yield (GRYLD), and plant height (PTHT)] combination. Method stands for the three variance-covariance matrices used with the BMTME. a Since three BMTME models are fitted (unstructured, diagonal, and standard), the values of the ranks ranged from 1 to 3, and the lower the values, the better the prediction accuracy. For ties, we assigned the average of the ranks that would have been assigned had there been no ties.
respectively. The simulation study provides evidence that when the correlation between traits is high, it is really important to use a multivariate model that takes into account this correlation to improve prediction accuracies. This evidence is also supported by the results obtained with the wheat data set, where the BMTME unstructured model was the most accurate model, followed by the diagonal and finally by the standard model. However, with the maize data set, we did not observe any gain using the unstructured variance-covariance matrix in comparison to the other two variance-covariances used (diagonal and standard), maybe because in this data set the genetic and residual correlations between traits were low. Therefore, the important message is that when the correlation between traits is high (.0.5), it is really important to estimate the unstructured variance-covariance matrix; when this correlation is low, it is enough to use the BMTME standard model because with the unstructured model, the results could be worse than those of the standard model. These suggestions are not new; they were also made by Calus and Veerkamp (2011), Jia and Jannink (2012), Guo et al. (2014), andJiang et al. (2015) in the context of multi-trait analysis. Here we only point out that they are also valid in the multi-trait, multi-environment context, taking into account the T · G · E interaction term.
Our contribution added to the traditional multi-trait model (proposed by Calus and Veerkamp 2011;Jia and Jannink 2012;Guo et al. 2014;Jiang et al. 2015) is that our model also is valid for the multienvironment and the three-way (T · G · E) interaction term, which more realistically mimic the type of data that are very common in plant breeding programs, where genotypes are evaluated for multi-traits in multi-environments. We are also aware that normally distributed traits are not the only traits commonly measured in plant breeding programs. For this reason, models for multiple categorical ordinal traits, multiple count traits, or a mixture of types of traits are also needed to help breeders improve the process of selecting candidate genotypes.
Prediction assessment of the BMTME model We introduced a Gibbs sampler for Bayesian analysis of multi-traits and multi-environments that takes into account the three-way (T · G · E) interaction term that uses simple conditional distribution to simulate the joint posterior distribution of all required unknown parameters in the WGP model. This model has the advantage that it uses Half-t priors on each standard deviation term and uniform priors between 21 and 1 on each correlation of the covariance matrix of traits in order to achieve noninformativity and posterior inferences with low sensitivity to the choice of hyper-parameters for the variance-covariance matrices.
Since we modeled the correlation patterns separately for each factor as G 1 ¼ G g 5S t and G 2 ¼ S E 5G g 5S t , this facilitates the interpretation of the contribution of every factor to the overall correlation structure. It also allows choosing specific covariance structures for each factor, which improves accuracy and makes model fitting easier. In addition, fewer parameters than an unstructured model are required. For example, for modeling G 2 under an unstructured model, we need to estimate IJLðIJL þ 1Þ=2 unknown parameters; this number of parameters is larger than the number of parameters required to be estimated using Kronecker products for a three-factor separable model that only needs IðIþ1Þ 2 þ JðJþ1Þ 2 þ LðLþ1Þ 2 2 2 parameters. In our context, the number of parameters is lower, since we assumed a diagonal matrix for the variance-covariance matrix of environments and the matrix G g is given. Also, if needed, partial derivatives, inverse computation, and Cholesky decomposition of the overall covariance matrix are performed more easily on the factor-specific covariances because they have smaller dimensions. Therefore, the use of separable covariance matrices with Kronecker products has substantial computational advantages, besides improving interpretation and model fitting (Simpson et al. 2014). However, care needs to be exercised with the assumption of a Kronecker product structured variance-covariance matrix, especially in three-way multivariate data, because incorrect assumptions may lead to invalid conclusions (Roy and Leiva 2008).

Contributions and limitations of the BMTME model
This study clearly described the full conditional distributions for modeling the three-way (T · G · E) interaction term with multi-traits and multi-environments, which is of paramount importance for evaluating genotypic performance in target environments and for predicting yetto-be observed phenotypes when the relative performance of genotypes varies across environments. Because the proposed model takes into account the correlation between traits and includes the three-way (T · G · E) interaction term, the BMTME can be a useful tool for efficiently selecting superior genotypes. The proposed BMTME model can be considered a Bayesian GBLUP for multiple traits and multiple environments since the marker information is taken into account in the GRM (G g ). Some of the advantages of our model over standard software are: (a) it is able to estimate separable covariance matrices of the form A5B5C, which is not possible with other software; (b) the estimation of three-way terms with covariance matrices of the form A5B5C is more parsimonious since fewer parameters are needed than when two factors are joined and the estimation process is performed using two separable covariance structures as A5B Ã , where B Ã contains the covariance of the two factors B and C; (c) the convergence of our model is not a big deal compared to the convergence problems of other software for complex data; and (d) our model facilitates the interpretation of the covariance matrices because we can estimate the three covariance matrices.
On the other hand, as expected, the disadvantage of the BMTME model is its high computational cost even under the optimized C++ developed and made available in this research article. Large numbers of lines might indeed cause some delays in the computation of such large numbers of parameters in the full conditionals. However, constant developments in computing science will soon reduce the computing time of the three-way BMTME model.
Finally, our proposed BMTME model can also be useful (a) in QTLmapping studies, since some WGP methods are also commonly used n for GWAS (Peters et al. 2012;Garrick and Fernando, 2013;Jiang et al. 2015), and (b) to include spatial information in the residual R matrix of the proposed model. This information is often available from breeding programs since they measure geographical information of the plots where genotypes are tested in each environment; this could help improve prediction accuracy.

Conclusions
In this paper, we extended the multi-trait WGP model to the multi-trait and multi-environment WGP model. This unified WGP model takes into account the correlation between traits and the three-way interaction term (T · G · E). Additionally, a transparent derivation of all full conditional distributions required is given that allows us to propose an efficient Gibbs sampler that is easy to implement and produces precise parameter estimates with high noninformativity and posterior inferences with low sensitivity to the choice of hyper-parameters for the variance-covariance matrices. Finally, we successfully applied the proposed method to simulated and real data and found that when the correlation between the traits is high (.0.5), the proposed BMTME model with an unstructured covariance matrix should be preferred over the diagonal and standard methods to help improve prediction accuracy. However, when correlations are low, it is enough to use the BMTME standard model because if we use the unstructured model, the results could be worse than those of the standard model. The R-software package BMTME offers specialized and optimized C++ routines to efficiently perform the analyses under the proposed model.

ACKNOWLEDGMENTS
We very much appreciate the International Maize and Wheat Improvement Center (CIMMYT) field collaborators, laboratory assistants, and technicians who collected the phenotypic and genotypic data used in this study.

LITERATURE CITED APPENDIX A
Derivation of full conditional distributions for the BMTME unstructured model e . Also, if we had assumed PðbÞ}1 as prior for b, we would have maintained a multivariate Normal posterior distribution due to the multivariate Normal distribution's conjugacy. However, the mean vector and covariance matrix would be slightly modified.
Full conditional for b 1 Defining h 1 ¼ Xb þ Z 2 b 2 ; the conditional distribution of b 1 is given as In a similar way, by defining h 2 ¼ Xb þ Z 1 b 1 , we arrive at the full conditional of b 2 as Full conditional for S t PðS t ELSEÞ } Pðb 1 jS t ÞPðS t a1; . . . ; a L ÞPðb 2 jS t ; . Obtained using ðB T 5AÞvecðXÞ ¼ vecðAXBÞ.
Full conditional for s 2 Ei ; with i ¼ 1; ::; I: Full conditional for a Ei ; with i ¼

APPENDIX B
Derivation of full conditional distributions for the BMTME diagonal model All full conditional distributions of the BMTME diagonal model are the same as those of the BMTME unstructured model, except those needed for the variance-covariance (S t ; R e Þ; which are now diagonal. For this reason, here we provide the variances of the diagonal elements of S t (s 2ðlÞ t ; with l ¼ 1; . . . ; LÞ, R e (s 2ðlÞ e ; with l ¼ 1; . . . ; L) and the required elements of a and a e .
APPENDIX E Data preparation for analysis with I environments, J lines, K replications, and L traits is shown in the table below. gid denotes unique lines name, env are the environments, rep denotes the replications, and resp represents the response variables.