Predicting grain yield using canopy hyperspectral reflectance in wheat breeding data

Modern agriculture uses hyperspectral cameras to obtain hundreds of reflectance data measured at discrete narrow bands to cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra, depending on the camera. This information is used to construct vegetation indices (VI) (e.g., green normalized difference vegetation index or GNDVI, simple ratio or SRa, etc.) which are used for the prediction of primary traits (e.g., biomass). However, these indices only use some bands and are cultivar-specific; therefore they lose considerable information and are not robust for all cultivars. This study proposes models that use all available bands as predictors to increase prediction accuracy; we compared these approaches with eight conventional vegetation indexes (VIs) constructed using only some bands. The data set we used comes from CIMMYT’s global wheat program and comprises 1170 genotypes evaluated for grain yield (ton/ha) in five environments (Drought, Irrigated, EarlyHeat, Melgas and Reduced Irrigated); the reflectance data were measured in 250 discrete narrow bands ranging between 392 and 851 nm. The proposed models for the simultaneous analysis of all the bands were ordinal least square (OLS), Bayes B, principal components with Bayes B, functional B-spline, functional Fourier and functional partial least square. The results of these models were compared with the OLS performed using as predictors each of the eight VIs individually and combined. We found that using all bands simultaneously increased prediction accuracy more than using VI alone. The Splines and Fourier models had the best prediction accuracy for each of the nine time-points under study. Combining image data collected at different time-points led to a small increase in prediction accuracy relative to models that use data from a single time-point. Also, using bands with heritabilities larger than 0.5 only in Drought as predictor variables showed improvements in prediction accuracy.


Background
Plant breeding programs routinely perform early field evaluations of large numbers of candidates for selection based not only on the main primary trait-grain yield measured in different environments-but also on several secondary traits related to yield, such as disease resistance. Methods that could help breeders measure grain yield based on other secondary traits in the early stages of plant growth could be of value to help reduce evaluation time and cost [25]. In recent years, the use of remote or proximal sensing, hyperspectral imaging, and laser scanners has helped develop low-cost, efficient highthroughput phenotyping platforms (HTPP) [1] which aim to collect data at low cost on many phenotypes of large numbers of breeding individuals at different stages of

Open Access
Plant Methods *Correspondence: j.burgueno@cgiar.org 1 International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 Mexico City, Mexico Full list of author information is available at the end of the article plant growth under different environmental conditions. This could drastically increase the number of traits that can be quantified on field-grown plants, selection intensity, prediction accuracy, and, therefore, the response to selection [25].
The basis of remote sensing and spectral science is the ability to measure electromagnetic energy at varying wavelengths that interact with different parts of the growing plant. The goal of spectral science is to measure a phenotype quantitatively through the interaction between light and plants, such as reflected, absorbed, transmitted and/or emitted photons. This is possible because each component of plant cells and tissues has wavelength-specific absorbance, reflectance and transmittance properties [16]. For example, a healthy plant interacts (absorbs, reflects, emits, transmits and fluoresces) with electromagnetic radiation in a different way than an infected plant [16].
In practical applications of HTPP to agriculture, the reflectance of electromagnetic energy at different wavelength bands is usually summarized in scores of spectral vegetative indices (VI) that are further used to predict plant physiological issues or agronomic traits. Spectral VI are a simple and convenient way of extracting information from remotely sensed data that facilitates the processing and analysis of large amounts of data acquired by modern cameras and satellite platforms [13,18]. Significant advances have been achieved in understanding the nature and proper interpretation of spectral VI [18,21] and theoretical frameworks have been proposed to support the development of indices optimized for particular applications. Popular VIs are the Normalized Difference Vegetation Index (NDVI), the Canopy Water Mass Index [30] and the Modified Normalized Difference at 705 nm wavelength (mND; e.g., [26]). Vegetative indices have been applied successfully in some crops [3][4][5][6][7][8][9]31]. For example, the canopy temperature (CT) and NDVI indices have been applied to estimate yield, taking advantage of the correlation between yield and these two VIs [2,15,17,22]. Also, it has been documented that the air-canopy temperature difference index can be used as a selection criterion in wheat (Triticum aestivum L.) breeding programs to estimate yield [24]. The NDVI has also been used successfully to estimate wheat yield before harvest at the regional and farm scale [15].
Although several spectral VIs are positively correlated with grain yield and other important agronomic and physiological traits, they do not consider all the spectral bands from the hyperspectral sensors [28,29]. Nevertheless, cameras with high spectral resolution can produce data on hundreds of spectral bands that can be further used to capture a wide range of information. Also, despite some successful applications of spectral VIs, most of these indices tend to be species-specific and, therefore, are not robust when applied across different species that have different canopy architectures and leaf structures because they use only a fraction of the available information on the measured wavelengths.
The important idea is that for sensor data to be meaningful, algorithms need to be developed to interpret the data and extract the most useful information to be translated into important traits for plant breeding. Thus, the use of high-resolution images is important to develop prediction models for grain yield, yield components, and relevant physiological and agronomical traits. However, the enormous volume, variety, and velocity of HTPP data generated by such platforms make it a 'big data' problem. Big data generated by these near real-time platforms must be efficiently archived and retrieved for analysis [27]. The analysis and interpretation of these large datasets is quite challenging, although several authors have proposed using sensor data through a linear regression based on standard ordinary least squares [29]. To overcome collinearity among predictors (bands from the sensor), Hernandez et al. [14] concluded that penalized ridge regression models from spectral reflectance data at anthesis or grain-filling predict grain yield well under different water levels. Recently, Ferragina et al. [12] proved that high-dimensional Bayesian regression models (similar to those used in genomic selection for predicting the performance of unobserved individuals based on a large number of markers) can be used to derive functions of hundreds of wavelengths. However, no Bayesian regression models or other functional regression models that define the function of wavelength for the prediction of grain yield and other traits in different environmental conditions have been studied in plant breeding [6].
Based on the above considerations, the main objectives of this study are: (1) to compare prediction accuracy of eight conventional spectral VIs (see Table 1) versus seven models that include ordinary least squared regressions for each spectral VI and all spectral VIs combined, Bayes B with all bands, Principal Components (PC) Bayes B regression, and three functional regression models, spline regression, Fourier regression and Partial Least Regression (PLS); (2) to identify the best models in terms of prediction accuracy; and (3) to identify time-points of plant growth before harvesting from which accurate predictions for wheat grain yield can be obtained. We illustrate the use of the different methods and models with data on grain yield collected on 1170 CIMMYT wheat lines evaluated in five contrasting environments. A total of 250 wavelengths were used on nine different time data points of crop growth.

Data
All 1170 lines were evaluated in all environments [Drought (severe drought), EarlyHeat (irrigated early planting for heat at sowing), Irrigated (irrigated bed planting), Melgas (irrigated flat planting) and Reduced Irrigated (moderate drought)]. In each environment, the lines were included in 39 trials, each comprising 30 lines; in each trial, the lines were studied using an alpha-lattice design with three replicates and six experimental blocks. The planting dates were all in 2013 as follows: Early-Heat on October 30, Irrigated and Reduced Irrigated on November 21, Drought on November 26 and Melgas on December the 1st. The traits measured on each line were grain yield (GY) and days to heading (DH), but only GY was analyzed in this study.
The image data was obtained using a Piper PA-16 Clipper flight that was fitted with a Hyperspectral camera (Model: A-series, Micro-Hyperspec Airborne sensor, VNIR Headwall Photonics, www.headwallphotonics. com, Fitchburg, Massachusetts, US) and thermal camera (A600 series Infrared camera, FLIR, www.flir.com, Boston, US). The plane flew at 270 m above the surface.
The aerial high throughput phenotyping (HTP) data was measured around solar noon time every date, aligning the plane to the solar azimuth for the data acquisition. Images of the experimental fields were obtained and formatted to tabular data by calculating the mean value of the pixels inside the center of each individual trial plot represented as a polygon area on a map. The software used to achieve this was ArcMap (ESRI, USA, CA).
On the data processing, the 38 cm per pixel CT data was corrected with a linear calibration of slope 1.2253 and Y intercept −6767.9 with the software ImapQ (Alava Ingenieros, Madrid, Spain). The several individual images of each flight were used to compose a unique mosaic per date with the software Autopano Giga (Kolor SARL, France). Then they were manually georeferenced using ArcMap (ESRI, USA, CA). The original image data is stored in kelvin units ×100, the next formula was applied with ENVI software (Excelis VIS, USA, CO) to convert the pixel values to Celsius degrees: (Pixel value)/100 − 273.15.
As well, the 30 cm per pixel hyperspectral data was processed with the software HyproQ (Alava Ingenieros, Madrid, Spain). First the images had a radiometric calibration with coefficients provided by the Laboratory for Research Methods in Quantitative Remote Sensing of the Consejo Superior de Investigaciones Científicas (Quan-taLab, IAS-CSIC, Spain) derived with a calibrated uniform light source; additionally the dark frame subtraction was performed to reduce the noise of the sensor. Corrections to decrease the effects of the atmosphere conditions in the images was performed modeling irradiance based on sun-photometer field measurements (Microtops II, Solar Light Company, PA, USA). The images were orthorectified and coarsely georeferenced based on the builtin Inertial Navigation System (INS/GPS). For the data extraction, where the image did not overlay the plots polygons because of INS inaccuracy, they were manually aligned to it in ArcMap.
The bands were measured on nine different dates (January 10, 2015; January 17, 2015; January 30, 2015; February 7, 2015; February 14, 2015;February 19, 2015;February 27, 2015;March 11, 2015;and March 17, 2015; which we called time-points 1, 2, 3,…, 9, respectively) using 250 discrete narrow wavelengths. In each plot, 250 wavelengths λ 1 , … λ 250 from 392.03 to 850.69 nm were measured for each wheat line. The ith discretized spectrometic curve is given by x 1 (λ 1 ), …, x n (λ 250 ). We used the notation x(780) without subscripts to denote the response of the band measured at a wavelength of 780 nm, x(670) to denote the response of the band measured at a wavelength of 670 nm, and so on. Note that early heat trial was planted in average 26 days earlier than the other trials; therefore, comparisons between individual time-points between heat trial and the others environments should consider data on the number of weeks since sowing. The comparison between the environments except early heat trial can be done more less fairly since the heading date ranged from 77 to 82 days after sowing, a period of five days in average. In all the environments heading date happened after the time-point four except in the Melgas environment in which heading date occurred at the same moment of time-point six.

Definition and computation of spectral vegetation indices
Eight different VIs were constructed with the 250 bands and are described in Table 1.

Adjusting the original data set
The lines were evaluated using an alpha-lattice design with three replicates and six incomplete blocks each, with five wheat lines randomly distributed within the incomplete block. This alpha-lattice design was established for each of the five environments. First, the design effect was removed in each environment and the BLUPs (Best Linear Unbiased Predictor) of genotypes for GY, for each of the 250 wavelengths and for each of the eight VIs were obtained in each of the nine time-points using the following model where μ is the overall mean,y ijkl is the response variable (GY, wavelength measure and VIs) for the ith genotype, jth trial, kth replicate, and lth block, g i is the random genetic effect of genotype i with normal distribution N 0, σ 2 g , t j is the random effect of trial j with normal distribution N 0, σ 2 t , r k(j) is the random effect of replicate k nested within trial j with normal distribution N 0, σ 2 r , b l(k,j) is the random effect of the incomplete block l nested within replicate k and trial j with normal distribution N 0, σ 2 b , and ǫ ijkl is the residual effect with normal distribution N 0, σ 2 e . After these pre-adjustments in each environment, we obtained BLUPs for each of the 1170 genotypes for GY, for each of the 250 bands and for each of the eight VIs. The BLUPs of genotypes were obtained for each of the nine time-points under study. Also, from fitting the alpha-lattice experimental model expressed above, we used the variance components of genotypes and of the error term to calculate the broad-sense heritability using the expression H 2 = σ 2 g σ 2 g +σ 2 e [10]; this was y ijkl = µ + g i + t j + r k(j) + b l(k,j) + ǫ ijkl , calculated for each of the 250 bands in each time-point in each environment.

Proposed single time-point models for the adjusted data
With the pre-adjusted data (BLUPs of genotypes for GY, for each of the 250 bands and the eight VIs), we propose to evaluate prediction accuracy for each time-point using the following statistical models: Model 1 Index ordinal least square (OLS) regression: All bands functional B-spline regression: All bands functional Fourier regression: where i = 1, …, n, with n = 1170, k = 1, . . . , K with K = 250, NPC denotes the number of principal components (PC) used and we used 6 (5, 10, 20, 35, 45 and 55).
x ik represents reflectance at the kth band collected in the ith genotype, PC il are the loadings of the lth PC on the ith genotype derived from the spectra data collected, z im is the mth index (RNDVI, GNDVI, SRa, RARSa, RARSb, RARSc, NPQI and PRI) derived from data collected at the ith genotype, while x i (k) is the functional predictor collected at the ith genotype and its corresponding functional data set is the sample [x 1 (k), . . . , x n (k)]. The error terms ǫ i were assumed to be independent with null mean and variance σ E 2 . α m , δ m , β k , γ k , are the regression coefficients for models 1, 2, 3 and 4, respectively, while β 1 (k), β 2 (k), β 3 (k) are the coefficient functions for functional models 5, 6 and 7, respectively.
The three proposed functional regression models (Models 5, 6 and 7) are the most popular functional regression models, where the responses are scalars and the covariates are functions. For this reason, the response variable (y i ) is a scalar in all the proposed models and represents grain yield (GY). Also, the difference between Models 5, 6 and 7 is the basis used for representing β o (k), with o = 1, 2, 3. Here a basis is understood to be a set of standard functions (φ w ) w∈N that are used to approximate any function of interest by a linear combination of a sufficiently large r w of these functions [23]. In Model 6, we assumed the B-spline basis; in Model 7, we used the Fourier basis; and in Model 8, the basis is the PLS. More details on the theory behind Models 5, 6, and 7 and their basis can be found in Ramsay and Silverman [23].
The parameter estimation of Models 1 and 2 was performed using OLS and implemented in the R software with the function lm() of the library MAS, while for Models 3 and 4, a Bayesian shrinkage-variable selection procedure (called Bayes B method) using a prior with a point of mass at zero and a t-slab was implemented in the BGLR R-package [20]. The functional models (Models 5, 6 and 7) were estimated with OLS and implemented in the R-package fda.usc [11] with 21 basis. First, models were fitted to the entire data set to evaluate goodnessof-fit to the training data and were then implemented through the cross-validation described in the next section. Using these 7 models, we created 19 methods (described in Table 2) according to the type of data they were applied to. The 19 methods were implemented in each of the five environments and per time-point.
It is important to point out that methods M1-M8 used only one of the 8 VIs, M9 used all 8 VIs simultaneously, M10-M19 used all 250 bands, but methods M11-M16 used all bands to perform a principal component analysis and then used different numbers of principal components (5, 10, 20, 35, 45, 55 PCs), as shown in Table 2. Additionally, methods M17 and M18 were implemented with only those bands whose heritability is >0.5.

Assessing prediction accuracy
For the prediction accuracies of the 19 proposed methods presented in Table 2, we implemented a ten-fold crossvalidation-with 1053 (90%) lines for training and 117 (10%) for testing in each fold-that was assessed by the Pearson correlation between the observed BLUPs of GY and their predicted values using the testing data set. We reported the average of the ten-fold cross-validation of the Pearson correlation (APC) as measure of prediction accuracy as well as the quantiles 2.5 (LL) and 97.5% (UL) (see "Appendices 1, 2"). It is important to point out that we used the same Split (of the ten-fold cross-validation) in the 19 methods to ensure fair comparisons between methods.

Results
The results are given in two sections: the first section presents the heritability estimates of each of the 250 wavelengths for each environment, while the second section presents the prediction accuracies estimated under the implemented methods.

Heritability estimates
The highest heritability estimates were found in the Irrigated ( Fig. 1b) and EarlyHeat ( Fig. 1c) environments, with values between 0.6 and 0.8 for most of the time-points. In these environments, heritability estimates are quite homogeneous across wavelengths, although in Irrigated, the lowest heritabilities were higher than 0.4 and were observed for wavelengths before 570 nm and those in the 580-700 nm range, while in EarlyHeat, the wavelengths with the lowest heritability were found before wavelengths of 480 nm and between wavelengths of 680-730 nm and all were higher than 0.4. On the other hand, the environment with the lowest heritability was Drought (Fig. 1b); the heritability before 450 nm and those in the 600-700 nm range are very low (around 0.2), while the rest of the bands with the highest heritability show values of around 0.6. The rest of the environments [Melgas (Fig. 1d) and Reduced Irrigated ( Fig. 9; "Appendix 3")] have intermediate heritability although they are very heterogeneous between time-points and across bands. For example, in the Melgas environment, we observed heterogeneity of heritabilities between time-points and across wavelengths and for wavelengths >750 nm for all time-points. While in Reduced Irrigated, the lowest heritabities (around 0.3) were observed in the 590-700 nm wavelength range, for six time-points ( Fig. 9; "Appendix 3").
In Early heat all time-points correspond to after heading stage while in the other trials time-points one to four were taken before heading, time points five and six during heading stage and after heading seven to nine time points. There is not a clear relationship between the heritability and the stage of the crop in which the images were taken.   9). In the Drought environment, the methods with the best prediction accuracy were those that used all the bands (M10-M19), while in the Irrigated environment in time-point 9, most of the methods that use all bands were the best in terms of prediction accuracy (methods M11-M19); however, in time-point 5, only methods M17 and M18 were better in terms of prediction accuracy than the methods that were built using the vegetation indices (M1-M9), but in time-point 1, methods M1, M3 and M4 built using the vegetation indices had the best prediction accuracy. In the EarlyHeat environment ( Fig. 2c), the methods with the best prediction accuracy were methods M17 and M18, which use all the available wavelengths, although it is important to point out that time-point 1 was better than time-points 5 and 9 in methods M10 to M16, which use all the bands. Also, in the Melgas (Fig. 2d) and Reduced Irrigated environments ( Fig. 10; "Appendix 3"), methods M17 and M18 had the best prediction accuracy. However, in these environments the best prediction was observed in time-point 9 and the worst, in time-point 1. Appendices 1 and 2 show the rest of the prediction accuracies for time-points 2, 3, 4, 6, 7 and 8 for all methods.

Comparing some methods for all time-points
Compared in this section are methods M1, M2, M9, M10, M17, M18, and M19. Methods M1 and M2 were chosen because they were built using two of the most widely used vegetation indices (RNDVI, GNDVI), whereas M9 uses all 8 VI simultaneously; M10 uses all bands and Bayes B. Methods M17 and M18 were included because they provided the best prediction accuracies in all the environments using all bands, while method M19 was used because it performed well in the last section using all bands. Drought (Fig. 3), Melgas (Fig. 4) and Reduced Irrigated ( this nomenclature is used in the rest of the manuscript. It is important to point out that the prediction accuracies of time-point 8 were considerably lower than those of timepoints 7, 9, and 6. In the Irrigated environment (Fig. 3), a similar trend is observed, yet for some methods (M17 and M18), time-point 7 provides the best predictions. In this environment (Irrigated), the differences in prediction accuracy between time-point 5 and time-points 6, 7, 8, 9, 79, 89, 789, and 6789 are not strong. This indicates that even with time-point 5, we can generate good prediction accuracies for grain yield. In the EarlyHeat environment (Fig. 4), all time-points produced good predictions, although methods M1 and M2 produced lower predictions in time-points 7, 9, 79, 89, 789 and 6789. It is important to point out that methods M17 and M18 were the best in all time-points in all environments, although in environments EarlyHeat and Melgas, the superiority of these methods is clearer.

Comparing environments for time-points 5 and 9
Figures 5 and 6 show that there are differences in prediction accuracy between environments. In time-point 5 ( Fig. 5), EarlyHeat was the environment with the best predictions, followed by Irrigated and Drought, while the worst predictions were observed in Melgas and Reduced Irrigation. In time-point 3 ( Fig. 6), the behavior was similar to that of time-point 5, since EarlyHeat was also the best in terms of prediction accuracy; however, here Melgas was the worst and the other three environments were in the middle. In time-points 7 (Fig. 6) and 9 (Fig. 5), the pattern was different since here the best predictions were in the Drought environment, and the second best was EarlyHeat, since in four of the seven methods presented in Fig. 5, this environment had the second best predictions. In third place is the Irrigated environment, while the worst predictions were observed in Melgas and Reduced Irrigated. It is important to point out that methods M17 and M18 were consistently the best in the five environments. Furthermore, it should be noted that the planting date in EarlyHeat is around 5 weeks earlier than the planting dates in the other four environments; thus the comparison of prediction performance at the same time-points does not represent a comparison at the same crop development stage. We observe in Drought that when only the bands with heritabilities >0.5 were used, prediction accuracies were better than when using all bands for both methods (M17 and M18). However, in the Irrigated environment using all bands, prediction accuracies were slightly better than when using only the bands with heritability >0.5; however, the difference is not relevant.  In Fig. 8, we observe that in both environments (Ear-lyHeat and Melgas), using all the bands was a little better than using only the bands with heritabilities >0.5, although the differences were not significant. The same pattern is observed for the Reduced Irrigated environment ( Fig. 12; "Appendix 3").

Heritability estimates of the bands
Results indicated that the heritabilities of each wavelength are not homogeneous across groups. The Irrigated and EarlyHeat environments had the highest heritabilities (with values between 0.6 and 0.8), which were homogenous across wavelengths, while Drought had the lowest heritabilities (with lowest values around 0.2), which were heterogeneous across wavelengths and time-points. Results in "Comparing vegetation indices vs all bands" section indicate that using all bands simultaneously as explanatory variables produced better prediction accuracies that using the VIs alone or combined. However, predictions were better when using only those bands with heritabilities >0.5 compared with using all bands only in Drought, while in the other 4 environments, using all bands produced slightly better prediction accuracies ("Comparing methods M17 and M18 using all the bands and bands with heritabili-ties>0.5" section). Therefore, the evidence indicates that using all bands simultaneously provided better prediction accuracies than using the VI alone or combined and even than using those bands with heritabilities >0.5. However, it is important to point out that the methods that used VI alone as predictor variables are very heterogeneous in terms of prediction accuracy, since some performed very poorly, while others produced reasonable predictions (for example, M6).

Prediction accuracy of the methods
Since we now have enough evidence to say that using all bands produced better predictions than using individual, combined VI and even when we restrict the models to less noisy features (H2 > 0.5), we compared the methods that used all the bands. Based on the prediction accuracy of the methods, results indicate that for this data set, methods M17 and M18 are the best for prediction. These two methods were better in all environments and in most of the nine time-points, and were also considerably better than the PC methods (M11 to M16), the Bayes B method (M10), and a little better than the functional PLS method (M19). The best two methods (M17 and M18) are functional regression models and correspond to models 5 and 6 described in "Proposed single time-point models for the adjusted data" section. Functional regression models nowadays have become an increasingly important statistical tool when the number of covariates is larger than the number of observations, where the unit of observation is generally viewed as a function or a curve defined based on some underlying continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid.
Given the nature of our data, the functional regression that we implemented only considered functional predictors; however, this regression method can also be used when both the predictors and the responses are functions. For this reason, functional regression models have been implemented successfully in many research areas (spectroscopy, economics, environmental studies, bioscience, system engineering, etc.). Functional regression is also very attractive because it is a non-destructive technology that measures numerous chemical compounds in a variety of products (plant, soil, food, petroleum, wood products, etc.) and can be used in large databases in experimental and non-experimental settings.

Prediction accuracy for time-points
Regarding the prediction accuracy for time-points, in general, prediction accuracies before time-point 6 were poor in four environments, and all time-points produced good predictions only in the EarlyHeat environment; a likely explanation for this may be that in this environment the sowing date was around 5 weeks earlier than the sowing dates in the other 4 environments, that is, the development of the crop for all time-points was more advanced in EarlyHeat. For this reason, the empirical evidence indicates that, for this dataset, timepoint 6 achieved good prediction accuracy. Also, in general, time-point 6 predictions are better than time-point 8 predictions. However, we need to be careful when interpreting time-point 6, since sowing time was different in each environment and the plants were at different growth stages when the bands were measured. Using this time-point can be helpful for breeders, since it is around 28 days before time-point 9. Also, it is important to point out that the predictions of the average time-points under study (79, 89, 789, and 6789) are a little better than those of time-points 6, 7 and 9 in methods that used all the bands; however, the increase in prediction accuracy is not large.

Conclusions
In this research, we proposed using all the bands simultaneously as predictor variables instead of using only one VI alone or all the VI together. First, we found that the heritabilities of the bands were heterogeneous across time-points and environments and that the best heritabilities were observed in the Irrigated and EarlyHeat environments and the worst in the Drought environment. We found that using all the bands simultaneously produced better predictions than using one VI alone or all the VI together. When we used only the less noisy bands (H2 > 0.5) in Drought, the predictions improved, while in the rest of the environments the results were similar. Out of the methods that used all the bands, the best methods across time-points and environments were M17 and M18 (functional B-spline and Fourier, respectively). Also, time-point 6 and 8 predictions were slightly lower than those of time-points 7 and 9, yet were close enough to be used for the prediction of wheat lines before harvesting. Finally, results show that the approach used to analyze high-resolution image data used in this study is promising; however, it is also clear that its application in this context is not straightforward, since the Bayes B method, which is popular for genomic selection, did not produce the best predictions. There are many challenges that need to be considered in future research using functional regression models, such as the inclusion of genotype × environment interaction, random effects, traits not normally distributed and multiple traits as response variables. Also, other conventional methods (GBLUP, Bayes A, Ridge Regression, Bayes C) used in genomic-enabled prediction should be tested in the context of high-resolution imaging data.

Authors' contributions
OAML and AML performed all analysis. JR and MS designed the research and collected the data. OAML, JC, GA, GdC and JB were involved in drafting and writing the manuscript. All authors read and approved the final manuscript. 1 International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 Mexico City, Mexico. 2

Source of funding
Funding was provided by "Centro Internacional de Mejoramiento de Maíz y Trigo".

Competing interests
The authors declare that they have no competing interests.

Availability of data and materials
The data and materials used in this study can be downloaded from the link: http://hdl.handle.net/11529/10693. The links contains file corresponding to the phenotypic and bands data for each environments, Drought.Phe_and_ Bands.RData, EarlyHeat.Phe_and_Bands.RData, Irrigated.Phe_and_Bands. RData, Irrigated.Phe_and_Bands.RData.