Evaluation of finger millet (Eleusine coracana (L.) Gaertn.) in multi-environment trials using enhanced statistical models

Spatial variation and genotype by environment (GxE) interaction are common in varietal selection field trials and pose a significant challenge for plant breeders when comparing the genetic potential of different varieties. Efficient statistical methods must be employed for the evaluation of finger millet breeding trials to accurately select superior varieties that contribute to agricultural productivity. The objective of this study was to improve selection strategies in finger millet breeding in Ethiopia through modeling of spatial field trends and the GxE interaction. A dataset of seven multi-environment trials (MET) conducted in randomized complete block design (RCBD) with two replications laid out in rectangle (row x column) arrays of plots was used in this study. The results revealed that, under the linear mixed model, the spatial and factor analytic (FA) models were efficient methods of data analysis for this study, and this was demonstrated with evidence of heritability measure. We found two clusters of correlated environments that helped to select superior and stable varieties through ranking average Best Linear Unbiased Predictors (BLUPs) within clusters. The first cluster was chosen because it contained a greater number of environments with high heritability. Based on this cluster, Bako-09, 203439, 203325, and 203347 were the top four varieties with relatively high yield performance and stability across correlated environments. Hence, scaling up the use of this efficient analysis method will improve the selection of superior finger millet varieties.


Introduction
Cereals are the main source of nutrition for the majority of people on earth. Millets are smallseeded species of cereal crops that are members of the Gramineae family [1]. Small-seeded grasses produced for food, feed, or forage comprise the diverse group of millets of which finger millet (Eleusine coracana (L.) Gaertn.) is the fourth most important after sorghum, pearl millet, and foxtail millet [2,3]. Finger millet is an allotetraploid (2n = 4X = 36, AABB) belonging to the family Poaceae and genus Eleusine [4]. Finger millet is the most climate resilient millet and can be grown under a wide spectrum of extreme climatic conditions. Thus, they can be termed "farmer friendly", providing better returns in comparison to other crops when subjected to changing climatic conditions [5]. Globally, 3,834,021 tons of finger millet grain are produced per annum [6]. In Ethiopia, 480,852 hectares of finger millet yield 1,327,267 tons annually [7].
Even though finger millet is crucial for livelihoods and food security, its productivity is far below the potential yield attained under experimental settings [10]. This is attributed to a variety of biotic and abiotic challenges coupled with socioeconomic limitations to adopt modern farming technologies. Finger millet blast disease, the most damaging constraint, is caused by Pyricularia grisea (Magnaporthe grisea), and can reduce yields by 7 to 54 percent depending on climatic factors and cultivar susceptibility [11]. In addition, pink stem borer (Sesamia inferens (Walker)), and finger millet root aphid (Tetraneura nigriabdominalis (Sasaki)] are notable insect pests of the crop [12,13].
Spatial variation and GxE interaction are common in varietal selection field trials and are a central problem confronting a plant breeder when comparing the varieties' genetic potential. These two factors can strongly bias variety estimates and result in large standard errors. Plant breeding trials usually involve a large number of test entries covering extensive areas where spatial variation is likely to confound reliable prediction of genetic values. This is particularly challenging in early generation variety trials with limited genetic material and replications [14]. Efficient approaches that account for more complex environmental variation require complementing experimental designs with appropriate models of analysis [15][16][17].
Spatial analysis describes an analysis in which the variance of each trial is examined and a suitable structure is used to estimate the trial's effects. Rather than obviate the need for a good experimental design, this approach increases it, because once a treatment effect is confounded with an environmental effect, the two cannot be disentangled [18]. Factor analytic multiplicative mixed (FAMM) models were used by Smith et al. [19,20] to extend the genotype and genotype × environment (GGE) analysis. Its significance for the estimate of the related variance structure for GxE effects is a crucial component of the factor analytic (FA) model for multi-environment trials (MET) data, since it offers a good and sparse approximation to the unstructured form and is generally more computationally robust [21].
The use of efficient statistical models to produce precise and illuminating results has advanced MET data analysis, which has a long history with the earlier statistical techniques of analysis of variance (ANOVA) [20,[22][23][24]. Therefore, it is important to evaluate finger millet genotypes using more efficient statistical methods and mixed model approaches (factor analytic and spatial models under a linear mixed model framework). This study aimed to improve selection strategies in finger millet breeding in Ethiopia through modeling of spatial field trends and the genotype by environment (GxE) interaction.

Study site and experimental design
The genotypes were evaluated for yield at Bako (9º6' N; 37º9' E) and Assosa (10˚02' N; 34˚34' E) Agricultural Research Centers in Ethiopia from 2018 to 2021 in seven environments (location-year combinations). The summary of the trials are presented in Table 1, and common genotypes between environments are shown in Table 2. Bako Agricultural Research Center (BARC) receives an annual rainfall of 1,600 mm, with mean maximum and minimum temperatures of 29 ºC and 13 ºC, respectively. The mean monthly relative humidity varies from 46 to 57%, while the main rainy season from May to October, with the most rain received in July and August and short rains also received from March to June. The Assosa Agricultural Research Center (AARC) receives an annual rainfall of 1100 mm, with mean maximum and minimum temperatures of 32 ºC and 17˚C, respectively. The soil of the area is Nitosol, which is characteristically reddish to brown in color. The trends of weather data for Bako and Assosa during the study period are summarized in Fig 1. In this study, seven MET datasets were used. To minimize the special variability in determining the genetic value, all trials were conducted using a RCBD with two replications laid out in a rectangular (row x column) array of plots. Trial and environment are terms used synonymously in this study, where environment/trial refers to a year by location combination. Similarly, we interchangeably use to entry, genotype, and variety. The trait we focused on in this study is grain yield, which is expressed in tons per hectare.

Statistical models
A general model for the analysis of GxE data can be built by stacking the data vectors for each trial, y nx1 ð Þ ¼ y 0 1 . . . y 0 n � � 0 , where n ¼ P t j¼1 n j , for the j th trial. In similar manner, all the fixed and random effects in the model as well as the residuals are combined across trials. In general, consider a series of 't' trials in which 'm' varieties are grown (all trials are not necessarily balanced for varieties). A model for y nx1 , the vector of individual plot measurements combined across trials, can be written as Where τ ¼ τ 0 1 . . . τ 0 t � � 0 is the vector of fixed effects including an overall mean for each of the t trials as well as other trial-specific fixed effects, with associated design matrix, X = [X 1 . . . X t ], assumed to be full rank. A random effect including a vector of genotype by environment effects, u mt x 1 senting the presence of genotype in each environment. Note that the design matrix may contain columns whose elements are all zero if the associated genotype is not present in that trial. The additional (non-genetic) random effects, u p are formed from the extraneous trend effects in each environment (for example, random rows and columns effects) with associated design matrix Z p , and e nx1 where G g is variance matrix for genotype by environment effects, G p is the variance matrix of extraneous trend effect and R is the variance matrix of the residual errors. In the model, the variance parametric in these variance matrices are directly estimated using REML estimation method. The model can be fitted to unbalanced (an incomplete genotype by environment table) due to the assumption that u g is random, which means that interaction effects can be predicted in the absence of data.

Models for the residual effects: e
Models for the variance matrix of the residual effects R, are considered. In the mixed model MET formulation, the vector for residual effects, e, can be partitioned into residual effects is the vector of residual effects for the j (th) trial. The subsectors in e are assumed to be mutually independent, with variance matrices The model also extends the definition of the R-structure to be a direct sum structure. The residual variance matrix for each trial may be R j ¼ σ 2 j I nj if independence is assumed between field plots.
More frequently, a spatial approach is used using the [25] techniques. Assume that the trial is arranged as a rectangular array of plots in the field as for the row-column design. The variance matrix of the residuals R j can then be formed as the kronecker product of the variance structure in the row S r and column E c directions.
The most common spatial variance model for the smooth trend in field trials is a separable autoregressive process in the field row and column directions, where S r and S c are the n c x n c and n r x n r correlation matrices in the column and row directions respectively [25]. The variance matrix of the residuals for j th trial is

Models for the extraneous trend effects: u p
To model extraneous trend in a MET, the random effects, u p , can be partitioned into sub-vec- where b j is the number of random terms for trial j. These random terms are typically based on replicate blocks or rows and columns in the field. In the MET, the sub-vectors in u p are assumed to be mutually independent, with variance matrix G pj for trial j, with the block diagonal form given below. That is, there is a variance matrix for the set of extraneous effects for each trial, where The most common form for the variance matrix of these extraneous effects is a simple variance component structure, where G pj ¼ s 2 j I bj .

Models for the GxE effects: u g
The random effects for genotypes, are combined across trials to form the random genotype by environment effects, as It is assumed that the associated variance structure has the separable form where G e and G v are symmetric t × t and m × m scaled variance matrices. That is, G e holds the (scaled) variances of genetic effects at each environment and covariances of genetic effects between environments, and G v holds the (scaled) variances of environment effects for each genotype and the covariances of environment effects between genotypes.
An assumption traditionally made in MET analyses of plant breeding trials is that effects between genotypes are independent, so G v = I m , then var (u g ) = G e � I m . A simplistic model for G e sometimes used in practice is to consider the results of each trial analysis individually. This is equivalent to a model that allows for heterogeneous genetic variances across environments, but assumes independence between environments. A more plausible model for the variance matrix of genetic effects is one which includes covariance to model similarity in genetic performance between environments. In practice, it has been found that most MET data have heterogeneous covariance between environments as well as heterogeneous genetic variances [26,27].
Based on a factor analytic model, Smith et al. [19] proposed an alternative parsimonious model for variance covariance structure, Ge. This model captures the nature of heterogeneous variances and covariances found to occur in most MET data.

The factor analytic model
In order to model the GxE effects, Smith et al. [19] specified a multiplicative model of factor analytic (FA) form (order k) as where Λ (t×k) is a matrix of environmental loadings for t environments, f (mk×1) is a vector of genotype factor scores for m genotypes, and δ (mt×1) is a vector of residual genotype by environment effects.
It is assumed that where C (t×t) is a matrix of specific variances, modelling the residual GxE variance not explained by the factor model. The variance of u g is again of the form G g = G e � I m , where The form of the variance matrix of genetic effects, G e , in a factor analytic model is comprised of two parts. Products of the factor loadings ΛΛ 0 give the portion of the genetic variance explained by the factor or regression model. The inclusion of the term, δ for residual GxE captures lack of fit to the factor model, and Ψ is the matrix of specific variances not accounted for by the loadings in the FA model. This FA model accommodates both heterogeneity of genetic variance across environments (due to Λ and Ψ) and heterogeneity of genetic covariance, or correlation, between pairs of environments (due to Λ).

Heritability formula
The heritability (H 2 j ) value for the j th trial was determined using an extended technique that takes unbalanced data into account, as suggested by Cullis et al, [28].
Where Aj is the average pairwise prediction error variance of genotype effects for the j th environment, and σ 2 gj is the genetic variance at environment j.

Spatial analysis
Individual trials were subjected to separate analyses to account for non-genetic effects using the Gilmour et al. approach (1997) [25]. The initial model fitted for each trial was a randomized complete block (RCB) model with random block/replication and variety effects, and the residual correlation structure is denoted by id(Column):id(Row), where id refers to the identity matrix. This model reflects the RCB analysis based on a mixed model approach. The three key types of spatial variation were then considered over the RCBD model. Thus, a spatial model for the residuals using a separable autoregressive process in the column and row dimensions was first fitted over the RCB model for the local variation. After that, the spatial model for extraneous variation was fitted along the column and row while keeping significance terms for local variation. The Residual Maximum Likelihood Ratio (REMLR) test was used to determine whether each fitted spatial model was significant for both local and extraneous variation. Finally, the Wald test was used to test the spatial models for global variation. The presence of global and extraneous variation associated with the row and column direction can be checked with the help of a model diagnosis tool, in particular the sample variogram [25], which was used as a typical model diagnostic plot for trial Assosa19 using the Metplot package, which is a part of ASReml-R, and is depicted in Fig 2. In the row direction, the sample variogram plot (Assosa19_a) showed evidence of a global trend, with the variogram steadily increasing. This trend, as demonstrated by Stefanova et al. [29], is most likely the presence of soil fertility or other spatial gradients of the trial, implying confounded fixed covariate effects. This global trend may be accounted for in the model by employing linear or polynomial regression over a centered row number, denoted lrow. After taking into account the overall trend in the row direction, the sample variogram (Assosa19_b) in Fig 2 indicated a drop from a steadily increasing trend in the row direction. Table 3 presents the Wald test for global variation and the REMLR test for local and extraneous spatial variation. The spatial variations were appropriately modeled for the trials Assosa2018, Assosa2019, Assosa2021, and Bako2018. Thus, the local spatial trend was found to be significant for Assosa2018 and Assosa2021 along the row direction (p-value = 0.007 and 0.019, respectively), and for Assosa2021 along the column direction (p-value = 0.033). The global spatial variation was found to be significant along the column direction at Assosa2018 and Bako2018 (p-value = 0.006 and 0.011, respectively). This demonstrated how spatial variation is a crucial component of plant breeding trials that should be considered during variety trails at multi locations. The estimates of the local spatial correlation parameters at trial Assosa2021 and Bako2018 were positive (ρ2 = 0.52 and ρ3 = 0.26, see Table 3), suggesting that plots that were close together were more likely to be comparable than plots that were further apart. The negative spatial correlation value at trial Assosa2018 (ρ2 = -0.57) implied inter-plot competition, as demonstrated by Stringer and Cullisin [30].  Test for global trend after significant terms for extraneous variation and local trend were fitted b Test for extraneous variation and local trend after significant terms for global trend were fitted ρ1, ρ2 and ρ3 are estimates for autoregressive order 1(AR1) spatial correlation parameters at Assosa2018, Assosa2021 and Bako2018, respectively, and trials with empty cell (-) indicated that no spatial variation was observed. https://doi.org/10.1371/journal.pone.0277499.t003

Factor analysis
The FA models were considered for genotype by environment (GxE) analysis while keeping the spatial models provided in the individual trial analysis. The adequacy of the FA models of several k orders was formally assessed as it was fitted within a mixed model framework based on the percentage of GxE variance explained by the factor components. Table 4 presents the results from the factor analysis. It includes the total percentage of (GxE) variance explained by the model's factor components for each trial as well as for all trials. Except for trial Bako2021, the FA models fit almost all trials well, and the genetic variance was well explained by the two factor components. Overall, the two multiplicative terms of the factor analytic models accounted for nearly 98 percent of the GxE variance, with the first multiplicative term accounting for 93 percent. Bako2021's insufficient fit with the FA model suggests that the trial is not as well correlated as some of the other trials [31]. The other important result from factor analysis was a cluster analysis using a dendogram, shown in Fig 3(a), which grouped trials based on genetic correlation between them. Based on the Cullis et al. [31] suggestion that clusters formed at the dissimilarity cut-off (approximately below 0.6), the dendrogram suggested possibly three clusters of trials, where the first cluster was comprised of at most five trials. This showed that the genotype ranking was almost similar for all trials found within these formed clusters, and a different ranking of genotypes for the trials found in different clusters. Genotype selection, therefore, was performed for each cluster using average BLUPs as a selection index, provided that the formed clusters were reasonably justified for making genotype selection independently for each of the clusters.
In addition to the dendrogram, other typical summaries from the factor analysis included a heatmap of the genetic correlations between all trials. This is presented in Fig 3(b), which shows the different correlation patterns between trials. From the heatmap, we can see most of the trials were highly correlated. This indicated that genotype selection can be performed by averaging genotype means across nearly all trials in the first cluster indicated in red. There were also some trials that showed negative genetic correlations, such as Assosa2018 and Assosa2021( Table 5), suggesting that there may have been a reversal action in genotype rankings among these negatively correlated trials. The bi-plot in Fig 3(c) also demonstrated the concept of variety performance and stability across environments, as well as the discriminating power of a trial. Trials with a long arm from the center of the bi-plot had a high genetic variance compared to the others, and thus had a relatively high discriminating power for genotypes. Thus, Bako2020 and Assosa2019 had high genetic variance compared to the others. Therefore, based on the dendrogram, heat-map and bi-plot (Fig 3) as well as the genetic correlation from Table 5, we considered 3 clusters of trials (C1, C2 and C3) for trait yield, where Assosa2018, Assosa2019, Assosa2020, Assosa2021, and Bako2020 were in C1; Bako202 in C2; and Bako2018 in C3. An average of BLUPs was used as selection index to choose superior and stable varieties through ranking average BLUPs within clusters.

Variance components
The genetic variance, error variance, and heritability for each trial from the final fitted Spatial +FA models are presented in Table 6. The estimates for variance component parameters   24 for heritability. Two of the seven trials, Bako2020 and Assosa2019, had higher genetic variance for yield. This indicated that these testing locations had relatively high discriminating power for genotypes. This may have been attributed to substantially higher amounts and distribution of rain for Assosa and Bako in 2019 and 2020, respectively. This also emphasized the significance of taking meteorological data from a specific cropping season into account when recommending the best genotype for a given cropping season, as well as its broader application across the country's many agro-ecologies. In addition, the Bako2021 and Bako2018 trials were found to be poor trials with low genetic variance and heritability, which could also be attributed to low rain fall or drought at Bako in 2021. This also led us to exclude the BLUPs of these trials when averaging across trials for selecting superior genotypes. The heritability of yield at each trial is shown in Fig 4 using conversional, spatial, and spa-tial+FA analysis. It shows that using spatial analysis improved heritability, and that using  spatial+FA analysis improved heritability even more. In general, analyzing MET data with spatial and FA models improved genotype evolution precision and accuracy by capturing nongenetic variation associated with agricultural field experiments and appropriately exploiting the information stored in the MET dataset [31,32].

BLUPs for genotypes mean values across environments
A standard method for estimating random effects in a mixed model is best linear unbiased prediction (BLUP), which has the property of minimum mean square error of prediction, and they can form a more accurate estimation of the underlying effects. Genotype effects are mostly fitted as random terms in a plant breeding context where accuracy of genotype ranking is important for selection of superior genotypes. This is more required in the early stages of genotype trials conducted with a large number of entries. The performance of genotypes can be ranked based on the values of BLUPs averaged across correlated environments of the 1st cluster (C1), excluding Bako2021 and Assosa2019 since they showed low genetic correlation with the other trials and low genetic variance. Table 7 shows that more than 28% (13) of the 46 genotypes had average grain yields greater than 1.3 t/ha. However, the predicted mean grain yield indicated three genotypes with a higher mean yield than 1.6 t/ha: one was the check, Bako-09, and the other two were 203439 and 203325 (Table 7). BLUP analysis further indicated that Bako trials in 2020 and Assosa trials in 2019 yielded high grain yields, implying that these sites are the best testing locations for distinguishing between finger millet genotypes and the most suited agro-ecologies for finger millet crop production in general.

Discussion
Following advances in statistical science and knowledge, methods of data analysis for crop variety evaluation programs have improved over recent years [22,33,34]. However, these improved statistical techniques have not yet been well utilized in many crop variety evaluation programs, including finger millet breeding in Ethiopia. This study's data set was incomplete; that is, not all varieties were grown in all environments. This was problematic for statistical analyses using ANOVA-based approaches, such as AMMI and GGE bi-plot analysis. To compensate for our incomplete data set, we opted to use the linear mixed model framework. Control by blocking, which is based on observable factors like soil type and topography, is commonly used in on-station trials, but several possible inputs are unknown or unmeasured, which stresses the importance of incorporating spatial information into the analysis of experiments to improve the precision of genotypes evaluation trials. This is especially important for field trials in Ethiopia, where the problem varies from plot to plot, as some findings justify [35]. The spatial mixed model approach proposed by Gilmour et al. (1997) [25] has received particular attention as it simultaneously considers three types of spatial variation to be modeled: local, global, and extraneous variations. Despite the recommendations by several authors, these approaches are not widely used in crop variety evaluation programs as a routine data analysis platform. In this particular study, we took advantage of these approaches and results of the spatial analysis were compared to that of randomized complete block (RCB) analysis. The investigated spatial mixed models showed better data fitting, resulting in a substantial improvement in the estimates of genetic parameters. Furthermore, our spatial analysis detected the presence of global, extraneous, and local trends. These trends could be explained by the presence of diseases and insects within and between blocks, missing plots, improper irrigation distribution, and micro soil fertility spots as well as the experiment's spatial gradients or the presence of soil fertility, as suggested by Stefanova et al. [29]. It could also be linked to the residual dependence of plots. When spatial trends were taken into account, relevant research on maize [36], sorghum [37], barely [38], potato [39], and soybean [40] likewise produced excellent results. The ANOVA-based approach always assumes constant variance across environments and no correlation between environments. We conducted MET analysis by fitting heterogeneous variance for residuals and a factor analytic model to the GxE effects, which results in the Restricted Maximum Likelihood (REML) estimates for genetic correlation between environments, and hence cluster analysis using a dendrogram and bi-plot. The results of MET data analysis showed that modeling GE interactions with FA models in combination with models for spatial variations resulted in a significant improvement in the estimates of genetic parameters, and this was demonstrated with evidence of heritability. Results from a related study on sorghum [41], durum wheat [42], and maize [36] revealed that modeling field spatial correlation plus MET through FA combined under a linear mixed model also indicated significant improvement in heritability analysis. The FA models were found to be useful not only for estimating and predicting GxE interaction effects, but also for estimating GxE variance and performing bi-plot analysis. Correlated environments could be identified using the estimated GxE variance, and breeders can select genotypes using BLUPs averaged across correlated environments. The superior varieties were chosen based on the average BLUPs because it could accurately predict the exact ranking of each variety's performance [32]. Anuradha et al. [43] employed BLUPs analysis via simultaneous selection index. However, their BLUP analysis differed from ours because the BLUP in our case was enhanced by capturing spatial variations and modeling the genetic correlation structure between environments.
We found two clusters of correlated environments using the dendrogram and the bi-plot, which aided selection of superior finger millet varieties within each cluster. Only one cluster was chosen for the entire variety selection because it formed with relatively high heritability and encompassed more environments. The second cluster, on the other hand, was found to have low heritability and fewer environments.

Conclusion
Combining MET with spatial and FA models allowed for a better understanding of the genetic effect and enhanced the precision and accuracy of genotype evolution by allowing the breeder to also take the GxE interaction effect into consideration. This makes it possible to isolate the genetic effect or further explore the GxE interaction effect, depending on the goal. The genotypes with the highest potential for a further verification study that might be used as a released variety could be found using the fitted information.
Supporting information S1