Optimal models in the yield analysis of new flax cultivars

Abstract: Multi-environment trials are conducted to evaluate the performance of cultivars. In a combined analysis, the mixed model is superior to an analysis of variance for evaluating and comparing cultivars and dealing with an unbalanced data structure. This study seeks to identify the optimal models using the Saskatchewan Variety Performance Group post-registration regional trial data for flax. Yield data were collected for 15 entries in post-registration tests conducted in Saskatchewan from 2007 to 2016 (except 2011) and 16 mixed models with homogeneous or heterogeneous residual errors were compared. A compound symmetry model with heterogeneous residual error (CSR) had the best fit, with a normal distribution of residuals and a mean of zero fitted to the trial data for each year. The compound symmetry model with homogeneous residual error (CS) and a model extending the CSR to higher dimensions (DIAGR) were the next best models in most cases. Five hundred random samples from a two-stage sampling method were produced to determine the optimal models suitable for various environments. The CSR model was superior to other models for 396 out of 500 samples (79.2%). The top three models, CSR, CS, and DIAGR, had higher statistical power and could be used to access the yield stability of the new flax cultivars. Optimal mixed models are recommended for future data analysis of new flax cultivars in regional tests.


Introduction
Flax (Linum usitatissimum L.) is an important oilseed and fibre crop. Oilseed flax (also called linseed) is a rich source of micronutrients, dietary fiber, and omega-3 fatty acids. Flax breeding objectives are to increase yield (yield stability), improve agronomic performance, enhance disease resistance, and improve quality characteristics (You et al. 2016). The ambition of the Canadian flax industry is to increase average yields by nearly 10% over the next decade with the help of improved agronomic management and enhanced genetics (new cultivars). Elite flax breeding lines supported for registration and new flax cultivars are evaluated in regional yield trials over a number of locations and years to determine their performance (yield and yield stability) in different agro-ecological zones of the province of Saskatchewan. The statistical methods employed to design the trials and analyze data from the regional tests to evaluate yield and yield stability need to be accurate, efficient, and informative.
New crop genotypes are often evaluated and expressed as a percentage of a designated check genotype value over a number of locations and years. This is a relatively simplistic statistical method and some limits with this traditional approach include the selection of an appropriate check genotype or genotypes. This comparison requires that check genotype(s) have consistent performance over a number of locations and years (Friesen et al. 2016). Genotypes under evaluation and check genotypes might change from year to year as well as location to location. Currently, the statistical analysis for yield data generated from flax regional performance test utilizes analysis of variance (ANOVA) (You et al. 2013). Analysis of variance takes all factors as fixed effects, assumes homogeneity of variance and the same covariance for all pairs of entries, and does not handle unbalanced data, which leads to estimates that are often inappropriate and unreliable . In contrast, the mixed model analysis method considers fixed and random effects in many commonly used experimental designs and correctly estimates genotype effect (G) and genotype × environment (G × E) interaction effects using an appropriate variance-covariance structure that can be adapted to a complicated data structure. This model allows for the presence or absence of heterogeneity or error variances among environments, heterogeneity of genotypic variances among environments, and heterogeneity of genotypic correlations between pairs of environments (Piepho 1997a(Piepho , 1997bSo and Edwards 2009;Hu and Spilke 2011).
In cultivar performance tests, a common case is that some entries are evaluated for a number of sites and years and then replaced by newer entries or advanced lines. As well, poor growing environments in certain locations or years lead to missing observations. Analysis of variance is often unable to estimate least squares means and compare cultivar performance for unbalanced data, whereas a mixed model analysis can provide an appropriate analysis producing correct results.
Homogeneity of variance and covariance are unrealistic in many circumstances. More complex and informative mixed models can account for variance and covariance heterogeneity in multi-environment trials (MET; Kempton 1984;Piepho 1998aPiepho , 1998b). The mixed model has been shown to be more efficient than the ANOVA model for estimating the accuracy of varietal differences. In practical applications, the choice of the optimal model depends on the experimental design, environments, genetic pattern of species or traits, required precision, and the type of desired information (So and Edwards 2009).
The mixed model has been applied to the analysis of cultivar trial data for many crops such as soybean (Andrade et al. 2016), corn (So and Edwards 2009;Hu and Spilke 2011), wheat (Friesen et al. 2016), rape , and canola (Beeck et al. 2010). Results are not always consistent due to differences in crop cultivars, trial environments, adopted variance-covariance structure of models, and data sufficiency. A great deal of theoretical research has been conducted with respect to multienvironment statistical methods (Finlay and Wilkinson 1963;Eberhart and Russell 1966;Shukla 1972;Hildebrand 1984;Piepho 1997b;Smith et al. 2001aSmith et al. , 2001bBasford et al. 2004;Smith et al. 2005;Littell et al. 2006;Coe 2007;Hu and Spilke 2011;Hu et al. 2014). Even though no one-size-fits-all model can be applied to all situations, the general suggestion is that more models must be tested and compared to find the optimal one; indeed, one or a few models might be preferred to deal with multiple environments for a certain crop. Simulated datasets used to compare mixed models might not fit all practical cases well for MET (Walker et al. 2011;Ferraudo and Perecin 2014). Some models have been employed to study a certain number of real datasets but, because the number of datasets was limited, drawing generic conclusions on the optimal models for some crops was difficult. A selection approach for models could be established based on testing a variety of models with different variance-covariance structures and using multi-year-location trial data and enough samples with real data for a particular crop.
In cultivar trials, the residual error variance commonly differs across environments due to changes in growing environments such as soil, weather, field-scale experimental conditions, and management across locations and years. The heterogeneity of residual error variances in analytical models has been considered in a number of studies (Casanoves et al. 2005;So and Edwards 2009;Hu and Spilke 2011;Hu 2014;Hu et al. 2014). Despite different options for appropriate models, heterogeneity of error variances was prevalent and prominent in MET and affected parameter estimates of models and the stability of the ranking of genotypes. Models with heterogeneous residual error variances fit the trial data better and gave smaller standard errors of model parameter estimates than their homogeneous counterparts. Models with heterogeneous residual error variances were a good alternative analysis for genotype stability in MET (So and Edwards 2009;Hu and Spilke 2011;Hu 2014;Hu et al. 2014).
The purpose of this study was to explore the potential advantages of the mixed model method and find optimal models with preferred variance-covariance structures based on a sufficient number of practical datasets of new registered flax cultivars in post-registration (regional) testing in Saskatchewan. A total of 16 models with different variance-covariance structures were compared. A two-stage sampling technique was used for diverse environments with variable years and locations for nine years and up to 10 locations. A total of 500 samples were used to test the models and estimate and predict the main effect of genotype yield and genotype ranking. The results led to recommendations for optimal mixed models for the analysis of regional trial data and assessment of the performance of new flax cultivars.

Experimental design
Flax yield data generated from post-registration trials conducted in Saskatchewan from 2007 to 2016 (except 2011) were used. Fifteen new cultivars were grown in up to 10 locations in different years. New cultivar evaluation was conducted in a complete random block design with three replications at each location.
Each entry was planted in seven-row plots each 2 m long and 1.5 m wide. CDC Bethune was employed as a check cultivar for all year-location trials. The center five rows of each plot were collected for grain yield assessment. Yield was adjusted to grams per square meter. As is common in crop genotype multi-year-location trials, the dataset was unbalanced as some entries were evaluated for a number of sites and years and then replaced by newer entries, or entries were missing due to poor growing conditions or other reasons. All genotypes were represented by data from at least two years and two locations or else excluded from further analysis. As such, at least two years of data were available for all trial locations. A total of 1907 data points were included in the analysis.

Mixed models
The statistical models evaluated in this study were based on a mixed linear model: where y ij is the yield of the ith cultivar in the jth environment, μ is the general mean, α i is the effect of the ith cultivar, β j is the effect of the jth environment, (αβ) ij is the interaction effect of the ith cultivar in the jth environment, and e ij is the residual error. The factor α i is considered fixed, while β j , (αβ) ij , and ε ij are treated as random. In our model, the effect μ contains both fixed cultivar effects as well as the random effects of the environment and the cultivar × environment interaction.
In the ANOVA model, β j and (αβ) ij are assumed to be independent and normally distributed with variances σ 2 β and σ 2 αβ , respectively, which implies that all cultivars have the same variance over environments [i.e., Covðμ ij , μ i 0 j Þ = σ 2 β ]. Genotypic variance and covariance between pairs of environments are homogeneous. This corresponds to the compound symmetry (CS) variance-covariance structure in the mixed model. To simplify the mathematical expression, a 3 × 3 variance-covariance matrix was taken as the general form, which was simply extended to higher dimensions: Assuming variance heterogeneity among cultivars, the higher dimension (DIAG) structure was used in mixed models: Some pairs of cultivars responded more similarly in given environments than others. This implies a correlation between cultivar performance and environment. In this situation, a correlation variance-covariance structure (COR) was considered in the analysis models: Data points close in time are assumed to be more highly correlated than data points distant in time. An autocorrelation model (ARMA) was used to fit in trial data for years: A factor analytic (FA) variance structure can be used to implement complicated data analysis and depends on the decomposition of an unstructured variancecovariance matrix in the mixed model. The FA structures are fitted with different numbers of components. Three components (FA1, FA2, and FA3) were applied in this study. The simple FA1 model with one component is: One of the most general variance-covariance forms is that no limits are imposed on the variance and covariance structure; variance and covariance are allowed to vary freely. This is a so-called unstructured model (US), and seems realistic in most cases. Because many parameters need to be estimated for this structure, it sometimes does not fit datasets well. The unstructured model has a matrix form in mixed models as follows: Heterogeneity of residual error variances directly affects the analysis of trait performance. Models allowing error variance heterogeneity are often more practical and have better fitness. In this study, all models considered two cases: heterogeneous (σ 2 ε1 ≠ σ 2 ε2 ≠ σ 2 ε3 ) and homogeneous (σ 2 ε1 = σ 2 ε2 = σ 2 ε3 ) environments. The names and notations for all of the mixed linear models with homogeneous and heterogeneous environments are listed in Table 1.

Model assessment
The choice of model directly affects the assessment of cultivar performance. Sixteen models with different variance-covariance structures were compared using the Akaike information criterion (AIC) (Oman 1991;Wolfinger 1993), for which a lower value indicates a better model. The CS model was taken as the baseline and the chi-squared test was used to statistically assess the fit of the two models, where General data analysis was conducted using R packages and Python, and the models were fitted using ASReml-R v3.0 and ASRemlPlus v4.0 (Bulter et al. 2007).

Two-stage sampling
Two-stage random sampling was employed; the first stage samples were randomly produced from those that had a minimum of five trial years and the second stage sampled by location where there was at least two locations for two years. Five hundred samples were finally produced, with an average sample size ranging from 780 to 1369 (Table 2).

Characteristics of the post-registration flax cultivar dataset
The yield varied greatly for new flax cultivars in the post-registration testing across year-location environments (Fig. 1). The highest average and the greatest range of yield occurred in 2009, the lowest average yield was in 2012, and the average yield was relatively stable from 2014 to 2016. The pattern of change in yield was similar for new cultivars but the response to environment (location and year) was complex. Moreover, there were interactions between cultivars and environments. Yearlocation environments had a substantial impact on yield performance. The dataset was unbalanced because of new entries added, older entries removed, or missing observations across locations and years. This meant that

Estimation of residual error variance for a single year-location environment
The residual error variance for each year-location environment was estimated using mixed model analysis (Fig. 2). The error variance was estimated to be 5195.39 at Watrous, SK, in 2013, which was over 880 times larger than the error variance at Saskatoon, SK, in 2007. The residual error was not homogeneous and thus models that correctly estimate error variance need to be considered. The mixed model allowed for heterogeneous error variance and could handle the complex data in MET and correctly estimate residual error variance.

Assessment of mixed models with different variance-covariance structures
CSR was the best mixed model with the smallest AIC value (12 068.51) for the post-registration flax cultivars dataset (Table 3). DIAGR was the next best model with an AIC of 12 075.44. The CS mixed model had relatively good performance but the CSR and DIAGR models fit the data comparatively better (p = 3.03 × 10 −5 and p = 8.05 × 10 −5 , respectively). CSR performed significantly better than DIAGR (p = 0.03706).

CSR model performance diagnosis
Model performance was diagnosed using standardized residuals, Q-Q plots, and predicted value distribution. Among 1907 data points, 78 cases (4.1%) had standardized residuals with an absolute value >2.0 for the CSR model. Q-Q plots highlight deviations from normality. For datasets with a good normal distribution, most points are located on the diagonal line. The CSR model fit the MET dataset with very close to a normal distribution for all years (Fig. 3). The residual plot exhibited a random array of dots evenly dispersed around the zero line for all years (Fig. 4). The CSR model best represented the complete trial information for all years and locations and fit the trial data and estimated the residual errors without bias.

Cultivar ranking with the application of 16 models
The ranking of new flax cultivars in terms of yield and yield stability across multiple environments is important. Figure 5 shows cultivar rank based on the different models. All models had a similar pattern and the rankings were highly correlated. The CSR and DIAGR models exhibited the best performance and generated a consistent cultivar order. All models resulted in similar estimation of cultivars with lower rankings. The order of cultivars with moderate yield levels was uncertain when models with different variance-covariance structures were applied. The relative ranking of the check cultivar CDC Bethune has a large impact on the assessment of new flax cultivar regional testing; here, its rank varied from second to fifth depending upon the model applied.
Overall, the mixed model analysis allows for a more equitable comparison than ANOVA amongst new and old cultivars across locations and years in the Saskatchewan Flax Variety Performance Test.

Mixed model performance for multiple samples with variable environments
While one model with the best performance was considered, the CSR model was the optimal model, with 396 times among the 500 samples (79.2%) ( Table 4). The CS model occurred 58 times (11.6%) as the best performance and the DIAGR model 27 times (5.4%). If the top three models were considered as potential preferred models, the CSR, CS, and DIAGR models best fit the dataset with similar frequency. Given the broad environmental variance across years and locations for the flax cultivars tested, the CSR, CS, and DIAGR models can be considered to apply with the best fit in the analysis of flax trial data.   Cultivar was taken as a fixed effect factor and the Wald test indicated the significance of the statistical test among the entries (cultivars) in regional performance testing. For the CSR model, 425 out of 500 samples (85%) showed high statistical significance among the entries (cultivars) ( Table 5). Only 12 samples were statistically insignificant with p values >0.05. Thus, the CSR model had the greatest statistical power to distinguish differences between entries (cultivars) in the regional trials. The DIAGR, CS, and DIAG mixed models had similar results.

Yield rank of check cultivar CDC Bethune in 500 samples
Results were quite divergent, with different models used to access the new crop genotypes for the 500 datasets. The check cultivar CDC Bethune, which entered in the trial every year, typically varied from first to fifth in rank depending upon the model used (Fig. 6). The CSR and DIAGR models ranked CDC Bethune from third to fifth but typically in fourth place. The CS, DIAG, FA1, FA2, and FA3 models produced similar results to the CSR and DIAGR models. The COR, US, FA1R, and FA2R models placed CDC Bethune from first to fourth for most samples. The CORR, FA3R, ARMA, ARMAR, and USR models did not provide a stable assessment with respect to the rank of CDC Bethune. The position of the check cultivar in the order directly affects the cultivar assessment system. Figure 7 displays the cultivar ranks estimated using the CSR model for 500 samples. Some cultivars had a stable order in all samples, such as AAC Bravo (mostly ninth), CDC Bethune (around fourth), CDC Glas (first to third), CDC Neela (first to fourth), CDC Plava (mostly sixth), CDC Sorrel (first or second), FP2388 (ninth or tenth), NulinVT50 (eleventh), and WestLin71 (twelfth). Cultivars CDC Sanctuary and WestLin72 were quite affected by environment, with the rank being variable in response to changes in location or year. This result shows that cultivar order can be used to access cultivar stability in MET when applying an optimal mixed model for analysis.

Discussion
Crop genotypes are evaluated in MET to predict and recommend preferred cultivars or breeding lines to producers. Cultivar performance has traditionally been expressed as a percentage of a designated check genotype value for each site-year for new flax genotypes in regional trials. This is an overly simplistic statistical method and also unreliable because of the changes in performance of the check genotype and its stability in MET. Analysis of variance has been the primary analytical approach to analyze data from multi-year-location trials. Because ANOVA takes all factors as fixed effects and assumes variance homogeneity and the same covariance for all pairs of varieties, it is not practical for assessment of new flax cultivars in regional trials. In addition, the working datasets are often incomplete and unbalanced due to the addition of new entries, removal of obsolete entries, or unforeseen circumstances causing missing observations. It is difficult using ANOVA to analyze such complicated data and make the correct assessment or estimation. Mixed model analysis is superior to the traditional approach for unbalanced and multiyear-location data analysis. This study indicates that the CSR model has the best characteristics for the analysis of post-registration flax cultivar trial data. The DIAG, CS, and DIAGR models are good alternatives in some environments. These four models could be used to fit the data from flax regional trials.
This study demonstrated the large fluctuation in residual error variance across multi-year-location environments. Ignoring error variance heterogeneity in MET might result in inflation or deflation of statistical type I error when estimating genotype effects and inefficient estimates of model parameters for genotype stability evaluation. Whether or not the heterogeneity of error variances is considered in the analysis procedures considerably impacts the statistical test results regarding genotype main effects, especially in the case of random environment main effects and genotype × environment interaction effects. Cultivar assessment and recommendations depend upon the estimation of genotype main effect. In this study, the CS model with heterogeneous error variances (CSR model) had the best statistical significance for the estimation of cultivar main effect; only 12 out of 500 samples (2.4%) were statistically insignificant for cultivar main effects. Good statistical power was also evident for both the CSR and DIAGR models. Which models best fit the crop genotypes in MET? Inconsistencies in many studies are often due to different crop species, trial years and locations, field operation and management, and unknown factors. So and Edwards (2009) found that the best model was one with some level of heterogeneity in the covariance structure and the most important level of heterogeneity was the residual error variance in the maize trials. Only  The unstructured model without any parameter limits is generally taken as the best model, although is not favored by some researchers (So and Edwards 2009;Hu and Spilke 2011). Although an unstructured model is certainly the most realistic in the majority of cases and   should have good performance for fitting practical data against simpler structures, the disadvantage is the large number of parameters that need to be estimated. Based on the parsimony principle, it was not a good fit for the trial data in this study. Other variance-covariance structures were specific forms of the unstructured model with restrictions imposed and, in most cases, these are more suitable for practical crop trial data. Hu et al. (2014) point out that models with heterogeneous residual error variances fit the trial data better and gave (with a 2.1%-8.4% reduction) smaller standard error of model parameter estimates than their homogeneous counterparts. Mixed, AMMI, and Eberhart-Russel models have been used to study genotype × environment interactions in sugarcane, where the mixed model exhibited higher sensitivity in 63 simulated cases representing different conditions (Ferraudo and Perecin 2014). Only a few of the FA models in this study had the best performance. The CS model with homogeneous error variance was not the worst model despite the strict structure and parameter limits. In addition, models with fewer parameters and a simpler structure were more suitable to implement on the limited number of datasets than other unfitted models. Although, the model statement includes the environment effect (β j ) and the interaction effect of genotypes and environment (αβ) ij . Environment involves many factors such as location, field topography, soil, climate (precipitation, cumulative temperature, and day length), and site management. In this study, it was not possible to ascertain the environmental and (or) management factors that may have been important to the variation in yield of flax. Zhang et al. (2014) utilizes structural equation modelling (SEM) to relate multiply phenotypic traits in flax accessions with seed yield; a similarly aligned study conducted with this dataset could determine the main yield drivers in flax by relating historical crop yield data with weather data (for sites where this data is available).

Conclusion
The mixed model is suitable for the analysis of complicated data in MET. The choice of optimal model relies on varying assumptions and variance-covariance structures. Residual error variances should be included in the data analysis. The CSR model best fits the flax postregistration trial dataset studied herein and had the highest statistical power. Overall, CSR, DIAGR, and CS are the preferred variance-covariance models and are recommended for future data analysis of new flax cultivars in regional tests.