Sparse testing designs for optimizing predictive ability in sugarcane populations

Sugarcane is a crucial crop for sugar and bioenergy production. Saccharose content and total weight are the two main key commercial traits that compose sugarcane’s yield. These traits are under complex genetic control and their response patterns are influenced by the genotype-by-environment (G×E) interaction. An efficient breeding of sugarcane demands an accurate assessment of the genotype stability through multi-environment trials (METs), where genotypes are tested/evaluated across different environments. However, phenotyping all genotype-in-environment combinations is often impractical due to cost and limited availability of propagation-materials. This study introduces the sparse testing designs as a viable alternative, leveraging genomic information to predict unobserved combinations through genomic prediction models. This approach was applied to a dataset comprising 186 genotypes across six environments (6×186=1,116 phenotypes). Our study employed three predictive models, including environment, genotype, and genomic markers as main effects, as well as the G×E to predict saccharose accumulation (SA) and tons of cane per hectare (TCH). Calibration sets sizes varying between 72 (6.5%) to 186 (16.7%) of the total number of phenotypes were composed to predict the remaining 930 (83.3%). Additionally, we explored the optimal number of common genotypes across environments for G×E pattern prediction. Results demonstrate that maximum accuracy for SA ( ρ=0.611 ) and for TCH ( ρ=0.341 ) was achieved using in training sets few (3) to no common (0) genotype across environments maximizing the number of different genotypes that were tested only once. Significantly, we show that reducing phenotypic records for model calibration has minimal impact on predictive ability, with sets of 12 non-overlapped genotypes per environment (72=12×6) being the most convenient cost-benefit combination.


Introduction
The world faces an urgent challenge to provide nutritional sustenance for a burgeoning 8 billion people (UNPF, 2023) amidst threats to food security, climate change, and finite resources (FAO, 2015;UNCCD, 2017UNCCD, , 2018)).Bridging the gap between food production and population growth necessitates innovative agricultural solutions.Sugarcane emerges as a key player in this scenario, serving dual purpose: i) as a primary sugar source (a dietary staple), and ii) bioenergy feedstock production (Goldemberg, 2008;Waclawovsky et al., 2010).The versatility of sugarcane makes it an ideal crop for targeted breeding efforts to enhance both yield (sugar and biofuel) (Hoang et al., 2015;Mahadevaiah et al., 2021) and yield stability (Scortecci et al., 2012).However, sugarcane breeding is challenged by long breeding cycles, low genetic diversity, large genome size and clonal propagation, limiting efficient genetic improvement and yield potential (Souza et al., 2011;Hoang et al., 2015;Yadav et al., 2020;Mahadevaiah et al, 2021).
In plant breeding, identifying superior cultivars for agricultural demands remain critical.Sugarcane's narrow genetic base, primarily due to vegetative propagation, complicates breeding efforts (Roach, 1989;Raboin et al., 2008;Wei and Jackson, 2016).Multienvironmental trials (METs) are essential in sugarcane breeding, enabling the evaluation of genotype performance across environments and identifying stable or specifically adapted cultivars (Jackson and Hogarth, 1992;Abu-Ellail et al., 2020).However, METs are resource-intensive, particularly for clonally propagated crops, prompting the need for more efficient trial designs.
Genome-wide molecular prediction (GWP), genomic selection (GS) or simply genomic prediction (GP) has been transformative in plant breeding, enhancing genetic gain rates over the past two decades (Crossa et al., 2010, 2011, Jarquıń et al., 2014;Ferrão et al., 2021;Voss-Fels et al., 2021;Mahadevaiah et al., 2021).GP uses genome-wide molecular markers to estimate breeding values of untested individuals (Meuwissen et al., 2001).By capturing the genetic variation in the genome, GP enables the prediction of complex traits that are otherwise challenging to assess using conventional breeding tools (Yadav et al., 2020;Ferrão et al., 2021).This approach facilitates early-stage breeding cycle decisions, reducing the need for extensive field trials and expediting cultivar development.In the context of sugarcane breeding, GP has been used to predict several traits of interest with moderate accuracy ranging from 0.2 to 0.45 (Deomano et al., 2020;Mahadevaiah et al., 2021;Islam et al., 2022;Islam et al., 2023), including both molecular and pedigree information (Deomano et al., 2020;Hayes et al., 2021) and modeling non-additive effects (Yadav et al., 2021).GP is particularly advantageous in predicting performance of individuals in new and partially tested environments through cross-validation (CV) schemes.This practice ensures that the predictive ability of GP models extends beyond the environments in which genotypes are bred, allowing accurate selection of individuals with potential high performance across a range of environmental and management conditions.Empirical studies suggest that incorporating G×E in GP models can streamline the breeding process, either by skipping certain stages or by reducing the number of genotypes tested in fields, thereby enhancing trial testing capacity (Crossa et al., 2017;Resende et al., 2018;Jarquıń et al., 2020).
Sparse testing, the strategic selection of a subset of genotypes for evaluation in target environments, offers an innovative alternative to field trials for the entire population This approach leverages statistical techniques and predictive models, such as GP, to extrapolate the performance of the entire population from a selectively observed subset.Through model calibration and CV, sparse testing enables prediction of unobserved genotype-inenvironment combinations.Consequently, it optimizes resource allocation, allowing breeders to focus on the most promising candidates from the target population of environments (TPEs).This methodology not only accelerates the breeding cycle but also the cost-effective identification of superior cultivars.
Sparse testing designs often mirror CV schemes CV1 and CV2 (Burgueño et al., 2012;Jarquıń et al., 2017), encompassing a broad range of genotype-in-environment combinations.In the CV1 scenario, genotypes never tested at any environment are predicted, while CV2 involves predicting already observed genotypes in incomplete METs.This raises critical questions about the optimal design of sparse testing in METs, such as the balance between testing a few genotypes across multiple environments versus many genotypes in fewer environments; and the trade-off between prediction accuracy and selection intensity.Empirical evidence from crops such as maize (Jarquıń et al., 2020;Atanda et al., 2021;Montesinos-Loṕez et al., 2023), wheat (Crespo-Herrera et al., 2021;Atanda et al., 2022) and soybean (Persa et al., 2023) has provided insights into optimizing sparse testing designs.Research by Jarquıń et al. (2020) and Crespo-Herrera et al. (2021) revealed that GP models incorporating G×E can maintain robust predictive ability even with reduced training sets.These studies indicate that the highest recovery of predictive ability in scenarios with either non-overlapping or completely overlapping genotypes in training set combinations mostly depends on the species.
In this context, this study investigates the predictive ability of different sparse-testing allocation designs in sugarcane breeding.We simulated the allocation of finite breeding resources to optimize the prediction of un-phenotyped genotypes in METs.For that, we varied the number of overlapping genotypes and training set sizes, similarly to the approaches of Jarquıń et al. (2020) and Crespo-Herrera et al. (2021).We considered three allocation scenarios: i) Non-Overlapping Genotypes (NOG), with a single observation per genotype across environments, ii) Overlapping Genotypes (OG) where all training genotypes are phenotyped across all environments, resembling the CV1 scenario, and iii) a combination of the NOG and OG schemes.To assess the contribution of capturing G×E in METs, three models were fitted for each sparse testing case: i) main effect of environment and genotype (M1); ii) main effect of environment, genotype and genomic markers (M2); and iii) main effect of environment, genotype, genomic markers, and the interaction between genomic markers and environment (M3).Our findings demonstrate effective resource optimization strategies for METs in sugarcane breeding, potentially increasing testing capacity by fivefold within a fixed target set of genotype-in-environment combinations or reducing total phenotyping cost between 83% to 93%.

Plant material and phenotyping
The dataset analyzed in this study comprises 220 genotypes (Jaimes et al., 2024)  Genotypes were planted at each location under a randomized incomplete block design.In Balsora and Taula Mejia 3 replications were planted, while in Porvenir, due to field limitations, only 2 replications were considered.The experimental unit at each location comprised a plot of 5 rows, each 5 meters long and spaced 1.65 m apart.Commercial checks, genotypes S29, S64, and S177, were replicated multiple times within each replicateblock, resulting in a total of 717, 720, and 576 experimental units for Balsora, Taula Mejia, and Porvenir, respectively.
Phenotypic data for sucrose accumulation (SA) and tons of cane per hectare (TCH), were collected during harvest (13 months after planting).SA was measured by shredding 10 stalks per experimental unit to obtain the juice per sample (Larrahondo and Torres, 1989).Then the sucrose content was quantified from the extracted juice through a near infrared (NIR) spectrophotometer (Larrahondo and Torres, 1989).For TCH, the measurement was taken after weighing all stalks per experimental unit.Both SA and TCH measurements were conducted for both plant cane and first ratoon at all locations.

Genotyping and quality control
DNA was extracted from each of the 220 genotypes following Dellaporta et al. (1983) protocol.Sequencing was conducted using both Genotyping-by-sequencing (GBS) and Restriction-site Associated DNA sequencing (RADSeq) on a Hiseq2000 Illumina System at a depth of 105X and 38X for GBS and RADSeq, respectively.Quality control was performed using FastQC (Andrews, 2010) and Trimmomatic (Bolger et al., 2014).Subsequently, the cleaned data were aligned and mapped to the CC 01-1940 monoploid reference genome (Trujillo-Montenegro et al., 2021), utilizing Bowtie2-2.5 (Langmead and Salzberg, 2012) with default parameters.Variant calling was conducted with NGSEP 4.0.2(Tello et al., 2019) by filtering out markers that failed to meet the following quality control criteria: a Minor Allele Frequency (MAF) below 3%, Phred score below 30, distance between markers below 5 bp, presence of more than 50% of missing data, and an average depth below 5X, as well as markers with more than one allelic version.Post quality control, a finalized count of 22,324 single nucleotide polymorphism (SNP) markers remained for analyses.The markers were coded based on the scaled allele content with values ranging between 0 and 2 in steps of 0.2 {0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, and 2}.

Phenotypic adjustment
The statistical model was independently applied to raw data at each environment aiming to obtain the Best Linear Unbiased Predictor (BLUP) values of the genotypes considering the three checks assigned to each block as fixed effects for microenvironmental control (Belamkar et al., 2018;Xavier et al., 2022) using the linear predictor described in Equation 1.
where y ikl denotes the phenotypic record of the i th genotype in the l th incomplete block within the k th replication, µ is the overall mean, Check corresponds to the fixed effects of the checks, L i is the random effect of the i th genotype, r k is the random effect of the k th replication, b l(r) is the random nested effect of the l th incomplete block within the k th replication and e ikl is the corresponding error term.These terms are considered as independently and identically distributed outcomes, following a normal distribution with a covariance structure based on the identity matrix I and scaled by their corresponding variance component s 2 .

Allocation strategies
From the initial 220 genotypes, this study focused only on 186 Colombian cultivars.Thus, a total of 1,116 genotype-inenvironment combinations (or phenotypes) resulted from the combination of the 186 Colombian genotypes observed at the six different environments.In practical scenarios, budgetary constraints imply that only a subset of these potential phenotypes can be realistically observed in the field.Therefore, strategies are essential in the allocation of genotypes across various environments.The goal is to create a calibration set comprising phenotypic records that maximize the accuracy of prediction models.This is key for obtaining accurate estimates of the performance of genotypes that are not observed in a set of environments.In this study, allocation optimization was executed by classifying the genotypes into one of the two following categories: Non-overlapping Genotypes (Genotypes seen only in one environment): In this category, genotypes are randomly assigned to specific environments in such a way that they are unique and observed only once across environments.For the case of study, this implies that each of the six environments would host 31 unique genotypes (186/6 = 31), guaranteeing that all genotypes are assigned to one environment only with no overlapping across environments.
Overlapping Genotypes (Genotypes seen in all the environments): Some genotypes that exhibit consistent performance across various environments or that are of particular interest for research purposes can be allocated in all six environments.This overlapping ensures that these genotypes are evaluated under diverse conditions, assessing their adaptability and stability across environments.
Let A and B represent, respectively, the number of nonoverlapping and overlapping genotypes, and T the number of environments or trials.For a given 'A/B' design, the number of required phenotypes is (A+B)×T, the number of unique genotypes tested is A×T+B and the average number of replicates/phenotypes per genotype is (A+B×T)/(A+B).Assuming a fixed budget for 186 plots and uniform plot costs in all the environments, two extreme allocation strategies arise.
-Testing same genotypes in all environments (0/31): Here, a subset of 31 unique genotypes from the original set of 186 is phenotyped in all the environments with no specific or nonoverlapping genotypes at each environment (A=0) and all the 31 being overlapped genotypes (B=31).The number of phenotypes is still (0+31)×6=186 but the number of unique genotypes is reduced to (0×6)+31=31, and the number of phenotypes per genotype is increased to (0+31×6)/(0+31)=6 phenotypes/genotype.
In addition, we can construct mixed designs by adding one overlapping genotype and removing one non-overlapping genotype from each environment.For instance, the design 16/15 is composed of 16 non-overlapping genotypes per environment and 15 overlapping genotypes phenotyped in the six environments.In this case, the number of phenotypes remains the same (16+15)×6=186 but the number of unique genotypes is 16×6+15=111 and the average phenotypes per genotype is (16+15×6)/(16+15)=3.42.
Starting from the full non-overlapping design (31/0), nine designs resulted after increasing the number of overlapping or common genotypes (B) by four with respect to the previous design, except for the first case where only three genotypes were added to B. These designs are 31/0, 28/3, 24/7, 20/11, 16/15, 12/19, 8/23, 4/27, and 0/31.The set of phenotypes or genotype-inenvironment combinations observed in fields, (A+B)×T=186, constitutes the calibration set for training models to predict the phenotypes of the remaining unobserved genotype-in-environment combinations (1,116-186=930 combinations).A visual representation of the designs is shown in Figure 1.
To test the allocation of resources in terms of predictive ability, not only regarding the proportion of non-overlapping and overlapping genotypes, but also with respect to different budget constraints, additional sample sizes were considered for composing training (168, 144, 120, 96, and 72 phenotypes).All calibration set designs are shown in Table 1, where the columns in the middle represent different designs (compositions) and the rows indicate the different calibration sizes.Ten random partitions or replicates were considered to evaluate the predictive ability of the different designs (combination of set size and composition).To ensure a fair comparison among sparse designs, the size of the prediction set was fixed to 930 (83.3%) for all these.This entails that some genotype-in-environment combinations were excluded from consideration in rows 2 to 6, neither included in either the calibration or the prediction set.

Predictive models
Three different GP models, based on random effects and modeled via covariance structures, were calibrated and used to predict the testing sets.The first model M1 is considered the base model, and it assumes independent and identically distributed (IID) outcomes among its components (Equation 2).The second model, M2, leverages molecular marker information to determine relationships between pairs of genotypes (Equation 3).While M1 and M2 are main effects models, the third model M3 introduces the G×E term (Equation 4).The model's performance was assessed using the Pearson's correlation between the observed and the predicted values within each environment.The traits SA and TCH were analyzed separately.
M1: Environment and genotype main effects.
where y ij is the adjusted phenotypic observation of the target trait in the j th environment (j=1, 2,…, 6) for the i th genotype (i=1, 2, …, 186), µ is the overall mean, E j is the random effect of the j th environment, L i is the random effect of the i th genotype, and e ij is the corresponding error term.E j , L i , and e ij are assumed to be independent and identically distributed outcomes from a normal density.Z E (n×6) and Z L (n×186) are the design matrices connecting phenotypes with environments and genotypes, respectively, and I (n×n) is the identity matrix, with n being the number of total observations (n=186×6).The variance component for each term is denoted by s 2 .In this model, the genotypes are assumed to be independent, and therefore, no information can be borrowed from phenotyped individuals to their nonphenotyped relatives.
M2: Environment, genotype, and genomic markers main effects., where all the terms in common with M1 have the same meaning, g i is the genomic effect based on maker SNPs for the i th genotype, and K (186×186) is the kinship matrix describing genomic similarities between pairs of individuals (VanRaden, 2008).Both L i and g i terms capture information about the genotype but in the latter, genomic information is used to relate phenotyped and non-phenotyped individuals.The genotype effect L i is not removed from the model to account for model missspecification and imperfect information (genetic variability that cannot be explained through SNP markers only).
M3: Environment, genotype, and genomic markers main effects, and Genotype × Environment interaction.where all the terms in common with M2 remain the same, and the added term gE ij represents the effect of the interaction between the j th environment and i th genotype (Jarquıń et al., 2014), ⊙ is the element-wise multiplication or Hadamard product operation between two matrices.The first column indicates the size of the calibration set across environments (relative size with respect to the total potential number of combinations).The last two columns show, the size of the prediction set (constant) and the number (and proportion) of phenotypes discarded from the calibration set.All the designs in the same row share a common calibration set size while the designs in the same column share a common number of non-overlapping genotypes.

Cross validation and performance metrics
In the context of plant breeding, particularly in the GP domain, the evaluation of models often involves the use of CV schemes (Jarquin et al., 2021).One widely used method is the K-fold CV, which helps to assess the performance of predictive models.K-fold CV involves splitting the dataset into K subsets or folds.The process then iterates K times, each time using one of the K subsets as the prediction set while the remaining K-1 subsets are used for calibrating the predictive models.This approach provides a robust way to assess the model's performance.It returns predicted values for all the datapoints in a fold without any of them being present in the calibration set.
The process of partitioning the dataset into folds is not trivial and might include some constraints.In this study, we included features of two common CV schemes: CV2 and CV1.In the basic CV scheme, known as CV2, the dataset is randomly partitioned without any specific constraints.This means that the model may predict the phenotype of genotypes that have been observed in other environments.While this approach is informative, it doesn't guarantee that the model will predict unseen genotypes, as some genotypes may appear in both the calibration and the prediction sets due to random partitioning.In CV1, a specific constraint is applied: all phenotypes of the same genotype must be placed in the same fold.This ensures that when evaluating the model, none of the observations of a genotype in any environment are present in the calibration set.This constraint guarantees that the models will predict an unseen genotype, as all observations of that genotype are in the prediction test.
The allocation designs constitute the spectrum between CV2 and CV1, with 31/0 being the most extreme case that could occur on a CV2 assignment, and 0/31 being the CV1 scheme.The higher the number of overlapping genotypes, the closer to the CV1 scheme.All the designs, except those with 0 non-overlapping genotypes, are CV2-like schemes.In the context of predictive modeling, the CV1 scheme poses a greater challenge, especially for models that do not incorporate genomic information and treat genotypes as independent entities.This challenge arises because the calibration set does not provide any useful insights during the model training process for predicting a specific, new, and unseen genotype.
To evaluate the performance of predictive models in this study, the Pearson's correlation and the Mean Squared Error MSE were used to measure the association between the adjusted phenotypes and the predicted values.These were computed for each environment or trial, reflecting how well the models performed when predicting the genotypes in specific environmental conditions.Averaged correlations and MSE values from the 10 replications or partitions were obtained.
To further interpret the results of models' performance, we computed the percentage of variance explained by each term in the different fitted models after conducting a full dataset analysis (i.e., with no missing values).

Software
Genomic prediction analyses were computed in the R environment (R Development Core Team, 2023) and the models were fitted using the BGLR package (Peŕez and De Los Campos, 2014).All models were fitted using 12,000 Markov Chain Monte Carlo iterations, incorporating a burn-in phase for the initial 2,000 iterations and a thinning factor of 5.

Variance components
The percentage of variance explained by each term for SA and TCH is shown in Table 2.For both traits, the environmental component captured the largest proportion of variability.However, there are significant differences between traits, with 46%-48% of the total variance explained for SA while 80%-82% for TCH.Examining the decomposition of the genetic variance, we found that for SA the variance explained by genotype in model 1 (56.2% of the within-environments variance) is higher than the sum of the variance explained by genotype and genomic markers (12.9%The first set of columns on the left side depict the percentage for all model terms: main effect of environments (vE), main effect of Genotype (vL), main effect of genomic markers (vg) and the interaction between genomic markers and environments (vgE), as well as the residual or unexplained variability (vR).The second set of columns on the right side depict the within-environment variance, i.e., the relative contribution of each term without considering the variance explained by the main effect of the environments.
Conversely, the variance explained by the interaction between genomic markers and environment is orthogonal to the other components, explaining the variability that cannot be captured by the main effects, thus reducing the residual variance.

Phenotypic correlation between environments
The phenotypic correlation values between environments for both SA and TCH traits are shown in Figure 2. A wide range of positive correlations was found between environments, with no high correlation values between environments sharing locations except in the case of SA in Balsora environments (r=0.822).Correlation values for Balsora environments with other locations were higher than those between Taula and Porvenir for SA.For TCH, we found lower correlations, with the maximum values observed between Taula 2019 and Porvenir environments (r=0.621 and r=0.588 for 2020 and 2021, respectively).
Figure 3 presents the accuracy achieved at each environment, as well as the overall mean accuracy and MSE values.The original values used to compute the means, as well as the values for scenarios with reduced calibration set are presented in Supplementary Table S1.The accuracy was generally higher when predicting SA compared to TCH for models M2 and M3 (M1 returned low results due to the lack of information to connect calibration and testing sets).As a general trend, the accuracy decreases as the number of overlapping genotypes increases.However, within the SA trait, the maximum accuracy was attained at the '28/3' allocation design.Furthermore, a local optimum was identified at the '12/19' allocation design, indicating that specific combinations of overlapping and non-overlapping genotypes can lead to improved predictive performance.
Regarding the models used in the study, it was found that M3, which accounts for G×E, generally outperformed the other models, but there were not significant differences with respect to M2, as stated in Supplementary Table S2.Additionally, as expected, the base model M1 consistently showed the lowest predictive accuracy across all scenarios, with the performance gap widening as the number of overlapping genotypes increased.Notably, the study consistently achieved the best predictions in specific environments, namely E1 (Balsora 2016), E2 (Balsora 2017), and E4 (Taula 2019), as can be noted in Supplementary Table S1. Figure 4 shows the average accuracy obtained across all calibration sizes and designs.In alignment with the results from Figure 3, we found that accuracy tends to decline as the number of non-overlapping genotypes decreases.This trend holds true for all models and designs for TCH, with a few exceptions for SA, specifically '12/19', '8/20', and '8/12' for models M2 and M3.
In contrast to the impact of decreasing the number of nonoverlapping genotypes, reducing the calibration size (i.e., the number of plots) by allocating a lower number of overlapping genotypes appears to have a minimal effect on accuracy, as shown in the main diagonals (i=j) of Figure 4. Remarkably, in models M2 and M3, the accuracy for the '12/0' design surpasses that of all other designs within the '12/B' family, except for the one with the larger calibration size and, therefore, number of overlapping genotypes ('12/19').This suggests that similar accuracy levels could be achieved by either decreasing the calibration size and using fewer overlapping genotypes or by maintaining the calibration size while evaluating more non-overlapping genotypes.

Discussion
The application of genomic prediction has found practical utility in sugarcane breeding.Recent studies have indicated moderate predictive ability for complex traits, including yield, fiber and sugar content, with great potential of increasing genetic gain (Deomano et al., 2020;Hayes et al., 2021;Yadav et al., 2021).Despite this, there is a need to find a balance between increasing genetic gain and reducing resource allocation in sugarcane breeding.Given fixed financial resources, it is often challenging to determine how to allocate a  Average accuracy for all designs and calibration sizes.Each subplot represents the accuracy obtained by each combination of model and trait.Each grid within a subplot represents the accuracy for a specific design and calibration size, with yellow, green and blue colors representing high, medium and low accuracy values, respectively.Grids in the same row share the calibration size while grids in the same column share a common number of non-overlapping genotypes.The specific designs are presented below the columns.predetermined number of genotypes for evaluation in targeted environments.To address this, optimization of resources in METs has become a routine practice.Among the strategies employed, the sparse testing allocation scheme with genomic prediction has been promising.This approach could identify the minimal number of candidate genotypes required for evaluation in METs and strategize their distribution in replicated or unreplicated field designs to achieve maximum genetic gains.In this study, we explored different sparse testing designs to determine the optimal calibration size and the trade-off threshold between prediction accuracy and selection intensity in sugarcane breeding.
One of the most notable trends is that CV1-like schemes ('0/B') pose significant challenges for M1.This phenomenon could be attributed to the fact that the predicted value of a genotype in each environment depends on the information of that exact genotype in other environments, which is not available in CV1 scenarios.By borrowing information from related genotypes through genomic information, M2 and M3 can extract patterns from the phenotypes of related genotypes and therefore, have a reasonable performance on CV1 scenarios.Similar results have been reported in a previous study by Atanda et al. (2022), where genomic and pedigree relationship between individuals can track segregating Quantitative Trait Loci, thus, explaining a large proportion of the genetic variance in a population.
On the other extreme, for the CV2-like schemes ('A/0') there were no notable differences in the predictive abilities of the models, even though they showed high predictive power (Figure 4).This is because the environmental effects are confounded with the genotype effect such that a single observation of a genotype in an environment is enough to predict its performance in other environments.Therefore, predictive ability strongly depends on the phenotypic information of tested individuals.The incorporation of genomic information explained a low genomic variance (SA: 17.9%, and TCH: 2.3%) in M2 (Table 2) and the model accounting for genotype by environment interaction in M3 captured a small information from related genotypes evaluated in correlated environments (SA: 15.4%, and TCH: 4.5%).This result implies a high correlation between the environments, given the observed small genotype differentiation across them.Other studies have also reported high genetic correlation between environments in sugarcane breeding programs (Deomano et al., 2020).As such, there is a need to redefine the target population of environments to avoid resource wastage during METs.These environments can be classified using important environmental factors and consolidated into megaenvironments for efficient resource allocation (Resende et al., 2021).
The best accuracy is consistently obtained in scenarios with high 'A' value.The larger the number of different genotypes observed at least once, the higher the performance achieved.This suggests that G×E is small and, therefore, phenotyping a genotype in one environment provides useful insights for its prediction in the remaining environments.However, this might seem contradictory to the findings in Table 2, where a significant portion of the Withinenvironment variance was attributed to the G×E effect.It is crucial to interpret those results with caution because, while all the 1,116 available phenotypes were used to calibrate the model that estimates the variance components, the predictive models were trained using a maximum of 186 phenotypes.
Moreover, the slight gain in accuracy obtained by M3, which accounts for G×E, occurs in the scenarios with more nonoverlapping genotypes.These findings indicate a potential cost saving opportunity can be achieved by allocating the genotypes to non-overlapping environments, that is, observing each genotype once across environment.This could decrease the cost of METs significantly in sugarcane breeding, as overlapping genotypes in some environments showed no gains in genomic prediction.This result is similar to the findings of Jarquıń et al. (2020) in maize where marginal gains were observed even when overlapping genotypes were increased significantly.The plausible explanation is either the environments share strong environmental similarly or the traits exhibit low response to genotype-by-environment interaction.In that sense, Balsora, Taula and Porvenir locations are characterized by having the same soil taxonomy with soils from vertisol, inceptisol and mollisol orders, which have a fine, dry, and soils from deep to moderately deep (Carbonell et al., 2001), making them locations with similar behavior.However, even though the locations (E1 to E6) have similar soil profiles, Taula (E3 and E4) and Porvenir (E5 and E6) also share a lower drainage capacity when compared with Balsora (E1 and E2), affecting the performance of the planted varieties, impacting positively the correlations of SA and TCH among locations.
The breeding program at Cenicaña involves making decisions after 12 years of selection based on the phenotypic information collected during the early stages of selection (three stages).The METs provide information of genotype adaptability to the target population of environments of Colombia (CENICAÑA (Centro de Investigacioń de la Caña de Azućar de Colombia), 1995).It is becoming important to integrate the allocation of phenotypes in plant breeding programs to optimize limited resources and increase genetic gain.Given the availability of genomic information during the breeding process, this can be leveraged to either evaluate more genotypes by increasing selection intensity or maximize the genetic gains with a fixed plot unit cost in sparse testing.As such, findings from this study in the case of sugar cane indicated that a cost saving opportunity could be achieved by overlapping a small number of genotypes in all environments while allocating the remaining genotypes to different environments.In addition, the use of a genomic prediction model that incorporates G×E (M3) allowed to capture genetic information among related genotypes in other environments to improve predictive ability.
The assumption of independence between genotypes held in M1 leads to biased models that perform poorly in comparison with the models that integrate genomic information to connect these.However, there is a clear trend to decrease accuracy with decreasing calibration set sizes.While the best accuracy is obtained with the full calibration set, there are designs with reduced calibration sizes that equal or even overpass the equivalent designs with larger calibration sizes.This is found especially in SA trait.Therefore, we can obtain similar accuracy values by reducing the budget or keeping the same budget but increasing the number of unique genotypes evaluated by increasing the number of non-overlapping genotypes.This is similar to what has been obtained in maize, wheat, and soybean sparse testing allocation designs (Jarquıń et al., 2020;Crespo-Herrera et al., 2021;Persa et al., 2023) but the difference we observed in sugarcane is that increasing the number of overlapping genotypes did not resulted in an increased predictive ability.This might also be due to the reduced training set sizes of our dataset compared with these studies.Interestingly, there are major differences in the average accuracy between traits.In the optimum scenarios, '31/0' and '28/3', the accuracy values for SA are 0.603 and 0.611, while for TCH are 0.341 and 0.332.SA is a more heritable trait with an extended genetic control while TCH trait is highly dependent on environmental conditions, specifically precipitation.Moreover, these differences might be caused by the phenotyping procedures.

Conclusions
In this study, we investigated the potential of genomic prediction for sparse testing resource allocation in sugarcane METs.It involved the utilization of different ratios of NOG/OG sugarcane clones and genome-based models, including G×E, to capture more genetic variability other than the main genomic effects.The obtained results indicated that genomic prediction models that incorporated G×E had the highest predictive response compared to other models in all allocation design scenarios.However, the advantage of the genomic model with G×E decreased with increasing NOG where highest predictive values were obtained.While this study focused on maximizing the genetic gain with a fixed cost per phenotype in sparse testing, the results showed that reducing the sample size of the genotypes assigned to environments (NOG) decreased the accuracy of genomic prediction.The trend decreased further with increasing overlapping genotypes evaluated across environments.This indicates that very few overlapping genotypes are needed across environments.This was attributed to a high environmental effect on the traits and moderate phenotypic correlation between environments.Generally, the results from this study showed that models including G×E can reduce resource allocation for phenotyping by up to 83% or increase the testing capacity by fivefold for multi-environmental trials in sugarcane.Therefore, sparse testing with genomic prediction is a promising strategy for maximizing genetic with fixed phenotyping cost in a breeding program.Premised on large environmental variance for the traits, we recommend the use of environmental factors to define TPEs to avoid investing limited resources in correlated environments without corresponding increase in genetic gain.

FIGURE 1
FIGURE 1 Graphical representation of sparse testing designs for a fixed budget and for a data set comprising 186 phenotypes and six environments.Each subplot represents a different design.The design name, number of unique genotypes and average phenotypes per genotype are shown at the top of each subplot.Rows and columns represent genotypes and environments, respectively.Blue color indicates that a genotype is observed in only one environment, while orange color indicates overlapping genotypes observed in all the six environments.White color is used to represent phenotypes not used in the calibration set and potentially forming the prediction set (155 phenotypes per environment or equivalently 930 phenotypes across environments).

FIGURE 2
FIGURE 2Phenotypic correlation between environments.The lower and upper triangular matrices represent, respectively, SA and TCH traits.

FIGURE 3
FIGURE 3 Average accuracy of models.Due to the increasing number of combinations, only the results for the allocation designs with the largest calibration set size (186) are shown for SA (left) and TCH (right).The x-axis represents the allocation designs, and the y-axis represents the accuracy measured as the Pearson's correlation (top panels) and Mean Squared Error (bottom panels) between predicted and observed values.Color denotes models.Dotted lines represent one specific environment.Solid lines and highlighted points represent the average accuracy across six environments.

TABLE 1
Resource allocation designs.

TABLE 2
Across and within environments percentage of explained variability by each model term for each trait.