Assessment of Ensemble Learning to Predict Wheat Grain Yield Based on UAV-Multispectral Reﬂectance

: Grain yield is increasingly affected by climate factors such as drought and heat. To develop resilient and high-yielding cultivars, high-throughput phenotyping (HTP) techniques are essential for precise decisions in wheat breeding. The ability of unmanned aerial vehicle (UAV)-based multi-spectral imaging and ensemble learning methods to increase the accuracy of grain yield prediction in practical breeding work is evaluated in this study. For this, 211 winter wheat genotypes were planted under full and limited irrigation treatments, and multispectral data were collected at heading, ﬂowering, early grain ﬁlling (EGF), and mid-grain ﬁlling (MGF) stages. Twenty multispectral vegetation indices (VIs) were estimated, and VIs with heritability greater than 0.5 were selected to evaluate the models across the growth stages under both irrigation treatments. A framework for ensemble learning was developed by combining multiple base models such as random forest (RF), support vector machine (SVM), Gaussian process (GP), and ridge regression (RR). The R 2 values between VIs and grain yield for individual base models were ranged from 0.468 to 0.580 and 0.537 to 0.598 for grain yield prediction in full and limited irrigation treatments across growth stages, respectively. The prediction results of ensemble models were ranged from 0.491 to 0.616 and 0.560 to 0.616 under full and limited irrigation treatments respectively, and were higher than that of the corresponding base learners. Moreover, the grain yield prediction results were observed high at mid grain ﬁlling stage under both full ( R 2 = 0.625) and limited ( R 2 = 0.628) irrigation treatments through ensemble learning based stacking of four base learners. Further improvements in ensemble learning models can accelerate the use of UAV-based multispectral data for accurate predictions of complex traits like grain yield in wheat.


Introduction
Bread wheat is an important food crop that meets the energy needs of more than 1/4th of the world population [1]. Sustainable wheat production is largely determined by the adaptivity of cultivars to various stress environments. Drought and heat stress are major prediction, where 186 accessions were collected from the Yellow and Huai Valleys Winter Wheat Zone (YHVWWZ) of China and 25 from five other countries.
The experiment was conducted at Xinxiang (35 • 18 N, 113 • 52 E), Henan province of China during the 2019-2020 cropping season ( Figure 1). The genotypes were planted under two irrigation treatments (i.e., full irrigation and limited irrigation) in randomized complete blocks (RCBD) with two replications. A total of 844 plots were phenotype under both irrigation treatments using UAV platform. Each plot consisted of six rows with a length of 3 m, a width of 1.4 m, and inter-row spacing of 0.2 m. The trials were planted on 26 October 2019 with a seeding rate of 270 seedings/m 2 and harvested on 2 June 2020. Both irrigation treatments irrigated equal water at the tillering stages, while the full irrigation treatment also flooded in heading and early grain filling stages with 2250-2700 m 3 ha −1 of water. The fertilizer and management of both treatments were optimized equally. In addition, the seasonal precipitation was 110.2 mm, and harvesting was done at full maturity using a combine harvester.

UAV Platform and Flight Mission
A RedEge MX multispectral sensor (Micasense Parrot, Seattle, WA, USA) (https: //wwwy.micasense.com/parrotsequoia) (Accessed: 30 November 2020) was mounted on a DJI M210 (SZ DJI Technology Co., Shenzhen, China) for multispectral imagery. The built-in GPS of the Camera provided the position and orientation of the images for data georeferencing. The RedEdge MX 4.0 can simultaneously collect images of five different bands with GPS information at specific intervals and with the same resolution. The resolution of blue, green, red, red-edge, and near-infrared bands was 32 nm, 27 nm, 14 nm, 12 nm, and 57 nm (half maximum bandwidth), respectively. The detailed information of the band wavelengths is listed in Table 1. Flights were conducted in clear and cloudless weather conditions between 11:00 a.m. to 2:00 p.m. The UAV flew over the trial area in the fully automatic flight mode set by the DJI GS Pro software (https://www.dji.com/cn/groundstation-pro) (Accessed: 1 December 2020). Images for radiometric calibration of the RedEge MX were captured on the flat ground before and after each flight through a calibrated reflectance panel provided by Micasense. To obtain images with high resolution, the forward and side image overlapping was set to 85% and 80%, respectively. The flight altitude was maintained at 40 m for heading and flowering stages, and at 30 m for early and mid grain filling stages. The ground resolution was around 3 cm for the images taken at the altitude of 40 m and 2.5 cm for the altitude of 30 m. The detailed information of the UAV-based data acquisition is presented in Table 2. In total, around 2000 to 2500 multispectral images were captured from each flight to make mosaic images. Pix4D mapper (Version 1.4, Pix4d, Lausanne, Switzerland) was exploited for orthomosaic generation after each aerial imagery of the experimental field. Meanwhile, plot segmentation for each image was done through QGIS 3.1.0 (https://www. qgis.org/) (Accessed: 30 November 2020). The polygon shapes with a specific ID were designed as masks to extract the spectral data of each plot. The spectral information was extracted using the computer vision algorithms provided by the ENVI software [5].

Vegetation Indices and Ground Data
As listed in Table 3, twenty vegetation indices were calculated as secondary traits to predict the grain yield. Most of the VIs contain near-infrared and red bands that have been used to quantify the biomass, pigment content, etc. in many crops. Grain yield was estimated for each plot after it was harvested at full maturity.

Stacking Regression Models for Ensemble Learning
Ensemble learning is a machine learning paradigm where multiple base learners are integrated to solve regression or classification problems. Stacking regression is an ensemble learning model proposed by Wolpert (1992) [40] to blend the predictors and improve prediction accuracy. It is commonly used to generate ensembles of heterogeneous predictors. In this study, four regression models including random forest (RF) [24], support vector machine (SVM) [25], Gaussian process (GP) [26], and ridge regression (RR) [27] were integrated for stacking regression-based ensemble learning. The R package "caret" (version 6.0−86) in R 4.0.2 (https://CRAN.R-project.org/package=caret) (Accessed: 30 November 2020) is exploited to build the base learners and the stacking regression framework. The fundamental of stacking regression is illustrated in Figure 2. The data pairs, canopy multispectral reflectance, and grain yield were randomly and evenly split into ten parts, and then one of the parts was used for the test. The predictions for each fold were made by training the model and performing tenfold cross-validation. Based on this, an out-of-sample predictions matrix (OSPM) was obtained. During the tenfold cross-validation, grain yield predictions of each regression model were generated separately to check the results of the base learner on the test set before they were averaged. An OSPM with a dimension of m × n (m is the number of base learners, and n is the number of samples in the training set) was obtained after the m base learners completed the above process. Then, the OSPM was used to train the level-2 regression model to make final predictions. The multiple linear regression (MLR) was used as the level-2 model to avoid collinearity among the prediction results for grain yield. The tenfold cross-validations were also conducted for the level-2 model to reduce uncertainty in prediction results. Especially, the same split process (with tenfold cross-validation) was performed in all models to ensure fair comparisons between the methods. To avoid uncertainty in results, the process of dividing data into training and test sets was repeated 20 times with tenfold cross-validation. This process generated 200 models, and the average prediction accuracy of these 200 models in the test set was taken as the final evaluation index.

Random Forest
Random forest (RF) regression can be regarded as a machine learning model that combines a large number of regression trees [24]. The final output prediction results are the averaged value of all the trees. Regression trees represent a set of conditions or restrictions, and these trees are constructed by bootstrap sampling from a training sample set. This construction strategy overcomes the shortcomings of data and is easy to overfit in the case of complex trees. The key step in constructing an RF is splitting regression trees, and the splitting criterion is based on selecting the input variable with the lowest Gini Index:

Support Vector Machine
Support vector machine (SVM) is derived from statistical learning theory and the minimum structural risk principle. SVM is widely applied to data analysis and pattern recognition [25]. The purpose of SVM is to minimize the error by adding a hyperplane and maximizing the margin between positive and negative samples in the training set. The introduction of the loss function allows SVM to solve nonlinear regression problems. Support vector regression can be defined as follows: where a represents the additional hyperplane alongside the regression line; k (x i , x) is the kernel function, and b is the bias.

Gaussian Process
Gaussian process (GP) is a type of probabilistic kernel machine based on Bayesian and statistical learning theory [26]. It has been extensively used in the field of machine learning. In probability theory and mathematical statistics, GP stands for the observed values in a continuous domain (such as time or space), and it can be regarded as the distribution and inference in a function space. In the process of prediction, GP maximizes the type-II maximum likelihood through the boundary likelihood of observations. In addition, it adjusts the hyperparameters and calculates the posterior distribution of unknown observations. When the observed variable space of GP belongs to a real field, a regression can be conducted to make predictions.

Ridge Regression
Ridge regression was proposed by Tikhonov in 1943 [27] and generalized by Hoerl and Kennard in 1970 [62]. It shrinks the regression coefficients by penalizing them or constraining their possible values. Specifically, ridge regression minimizes the sum of squared errors including an L2-norm penalty on the size of the parameter estimates. The coefficient estimates are, and the corresponding formula is as follows: where λ controls the amount of shrinkage; N represents the number of observations; y is the dependent variable; p andB j are the number of independent variables and the value of the jth coefficient, respectively.

Cross-Validation and Hyperparameter Tune
At level-1 of the stacking regression, the tenfold cross-validation was exploited to form a sample matrix, and this procedure is considered as outer cross-validation. Meanwhile, the inner cross-validation and grid search were conducted to fine-tuning the hyperparameters of the base learners shown in Figure 2. In the outer cross-validation, the original data set was randomly divided into 10 equal subsets ( Figure 3). Each time, one of the subsets was used for validating, and the remaining nine subsets for training. Each training set used for the outer cross-validation was also randomly and evenly split into 10 sets, with 10% of the data in the inner validating set and the remaining 90% in the inner training set. In the inner cross-validation, different combinations of candidate hyperparameters were set to construct the model on the inner training set. Then, the constructed model was validated on the inner validating set. Each hyperparameter combination was validated 10 times, and the hyperparameter combination with the highest average validation accuracy on the outer training set was transferred to the outer cross-validation to train the ideal model. The detailed hyperparameters of each machine learning are listed in Table 4.  ntree means the number of regression trees, and mtry means the number of input variables per node; cost is the parameter that controls the trade-off between minimization of the model's complexity and minimization of the training error; gamma is the parameter for radial basis kernel function, and it determines the distribution of the data mapped to the new feature space; lambda is model penalization value; sigma is the parameter that inverses kernel width for the radial basis kernel function; RF, random forest; SVM; support vector machine; GP, Gaussian process; and RR, ridge regression.

Model Performance Evaluation
To evaluate the model performance for grain yield prediction, the coefficients of determination (R 2 ) and root mean square error (RMSE) were calculated through the following equations: where n is the number of samples; y i andŷ i are the measured and the predicted grain yield of sample i, respectively; y represents the mean of the measured grain yield. The model with a higher value of R 2 and lower values of RMSE can predict grain yield better.

Statistical Analysis
A mixed linear model was exploited to test the significance of variation between genotypes, irrigation treatments, and their interactions for vegetation index and grain yield. The model is presented as follows: where Y represents the response demonstrated by fixed effect (β) and random effect (µ) with random error (ε). X and Z are fixed and random effects, respectively. Heritability was estimated through the following formula: where r is the number of replicates per treatment; σ g 2 and σ ε 2 indicate the genotypic and error variances, respectively [63]. The spectral traits with low heritability were considered as noise features in previous studies [4]. In this study, the VIs with heritability less than 0.5 were not used as the input features to construct grain yield prediction models.

Phenotypic Analysis
The distribution of averaged grain yield data of irrigation treatments is given in Figure 4. The grain yield under limited irrigation treatment was 8% less than that under full irrigation treatment. It can be seen from the results listed in Table A5 that the coefficient of variation (CV) under full irrigation treatment was 15.7%, which was slightly higher than that under limited irrigation treatment (14.6%). The analysis of variation (ANOVA) results revealed significant variations (p < 0.001) among the genotypes for all vegetation indices (VIs) across the growth stages and grain yield. In addition, there was a significant difference between the two irrigation treatments for all traits (Tables A1-A5). The heritability (H 2 ) was high in the mid-grain filling stage, which ranged from 0.60 to 0.91 for all VIs (Tables A1-A4). The VIs with heritability values greater than 0.5 were selected as the input features for the model to make grain yield prediction for wheat in a particular growth stage. Under full irrigation treatment, the numbers of VIs with a heritability value greater than 0.5 were 17, 14, 18, and 20 for heading flowering EGF and MGF, respectively, and the the numbers of VIs with a heritability value greater than 0.5 under limited irrigation treatment were 19, 19, 18, and 20 across growth stages ( Figure 5).

Performance Base Learners for Grain Yield Prediction
Under full irrigation treatment, the individual base learners achieving the best yield prediction results for the four growth stages are as follows: RR with mean R 2 of 0.488 and mean RMSE of 0.859 t ha −1 for the heading stage; GP with mean R 2 of 0.520 and mean RMSE of 0.831 t ha −1 for the flowering stage; RF with mean R 2 of 0.602 and mean RMSE of 0.711 t ha −1 for the early grain filling stage, and RR with mean R 2 of 0.604 and mean RMSE of 0.777 t ha −1 for the mid grain filling stage ( Figure 6 and Table 5). SVM did not achieve the best predictions for any of the growth stages. Under limited irrigation treatment, the individual base learners achieving the best yield prediction results for the four growth stages are as follows: RR with mean R 2 of 0.561 and mean RMSE of 0.699 t ha −1 for the heading stage; RR with mean R 2 of 0.611 and mean RMSE of 0.660 t ha −1 for the flowering stage; RF with mean R 2 of 0.599 and mean RMSE of 0.668 t ha −1 for the early grain filling stage, and RF with mean R 2 of 0.626 and mean RMSE of 0.641 t ha −1 for the mid grain filling stage. In addition, it can be seen that the prediction results under limited irrigation treatment were slightly higher than those under full irrigation treatment. The ensemble models with prediction accuracy lower than base models are highlighted by underline. Abbreviations: RF, random forest; SVM; support vector machine; GP, Gaussian process; and RR, ridge regression. As shown in Figures A1 and A2, the Pearson's correlations between the grain yield from predictions generated by the machine learning algorithms across the four growth stages and the two irrigation treatments were high (r = 0.81-0.97). By contrast, the density curves of the grain yield predictions were slightly different from each other. Moreover, the base learners differed significantly in the distribution interval of the prediction accuracy.

Ensemble Approach for Grain Yield Prediction
To evaluate the performance of the stacking method, the grain yield prediction is obtained by using different combinations of the base learners for the four growth stages and the two irrigation treatments. The grain yield prediction results obtained by combining two of the base learners are presented in Figure 7 and Table 5. The model combinations achieving the best mean prediction results for the four growth stages under full irrigation treatment are as follows: RF-RR with mean R 2 of 0.502 and mean RMSE of 0.843 t ha −1 for the heading stage, GP-RR with mean R 2 of 0.551 and mean RMSE of 0.808 t ha −1 for the flowering stage, RF-GP with mean R 2 of 0.620 and mean RMSE of 0.709 t ha −1 for the early grain filling stage, and GP-RR with mean R 2 of 0.622 and mean RMSE of 0.752 t ha −1 for the mid grain filling stage. The model combinations achieving the best mean prediction results for the four growth stages under the limit irrigation treatment are as follows: GP-RR with mean R 2 of 0.570 and mean RMSE of0.691 t ha −1 for the heading stage, RF-RR with mean R 2 of 0.619 and mean RMSE of 0.643 t ha −1 for the flowering stage, RF-GP with mean R 2 of 0.618 and mean RMSE of 0.648 t ha −1 for the early grain filling stage, and RF-RR with mean R 2 of 0.629 and mean RMSE of 0.639 t ha −1 for the mid grain filling stage.   Figure 9 shows the performance of the stacking regression combing all the four base machine learning models for grain yield prediction of the four growth stages. Under full irrigation treatment, the mean values of R 2 for the stacking regression were improved to 0.498, 0.538, 0.622, and 0.620 for the heading stage, flowering stage, early grain filling stage, and mid-grain filling stage, respectively. Meanwhile, the mean values of RMSE were reduced to 0.851 t ha −1 , 0.820 t ha −1 , 0.709 t ha −1 , and 0.713 t ha −1 for the heading stage, flowering stage, early grain filling stage, and mid-grain filling stage, respectively. Similar findings were also observed under limited irrigation treatment. The mean values of R 2 were up to 0.562, 0.617, 0.611, and 0.628, and the mean values of RMSE were 0.702 t ha −1 , 0.647 t ha −1 , 0.651 t ha −1 , and 0.642 t ha −1 , for the heading stage, flowering stage, early filling stage, and mid-grain filling stage, respectively. We estimated the spatial distribution of the grain yield at the plot scale based on the ensemble model by combining four base learners at mid grain filling. The difference of grain yield under the two irrigation treatments can be observed directly in Figure A3. In terms of the mean R 2 and RMSE, most of the ensemble models achieved higher prediction accuracy than the individual best models (Table 5), which confirmed the effectiveness of the ensemble model implemented in this study. In addition, the overall accuracy indicated that the prediction performance of the ensemble model was proportional to the number of base learners ( Figure 10).  Figure 11 illustrates the distribution of the regression coefficients for the four base learners within the secondary model (MLR). A higher regression coefficient of a particular base learner indicated a larger weight in the stacking procedure. Under full irrigation treatment, the stacking performance for the heading stage was strongly influenced by the RR and GP models with high mean regression coefficients of 0.89 and 0.61, respectively. Similar results were observed for the flowering stage, where the mean regression coefficients of the RR and GP were 0.59 and 0.51, respectively. The impacts of the RF and SVM models on the stacking regression were relatively small with the mean regression coefficients of −0.01 and −0.02, respectively. The RF had the highest impact on the stacking performance for the early grain filling stage with a mean regression coefficient of 0.50, followed by the SVM with a mean regression coefficient of 0.33, RR with a mean regression coefficient of 0.03, and GP with a mean regression coefficient of 0.21. For the mid-grain filling stage, the RF and RR models had similar weights with the mean regression coefficients of 0.46 and 0.45, respectively, followed by the GP and SVM with the mean regression coefficients of 0.08 and 0.03. Under limited irrigation, RF showed a higher influence on the stacking performance with a mean regression coefficient of 0.61, while the GP, RR, and SVM models had mean regression coefficients of 0.31, 0.14, and −0.06, respectively. For the flowering stage, early grain filling stage, and mid-grain filling stage, the impact of the RF-based model was with the regression coefficient of 0.53, 0.55, and 0.05, respectively; the impact of the RR-based model was with the regression coefficient of 0.41, 0.52, and 0.50, respectively; the impact of SVM-based model was with the regression coefficient of 0.24, 0.39, and 0.13, respectively, the impact of GP-based model was with the regression coefficient of −1.14, −0.41, and 0.36, respectively. Overall, the results indicated that the stacking regression achieved higher prediction accuracy by allocating a more reasonable weight of base learners under various modeling conditions.

Discussion
The UAV-based multispectral vegetation indices have been increasingly exploited to predict plant physiological traits by crop breeding studies [5,64]. In this study, 20 VIs covering all five light bands captured by the multispectral sensor were taken to evaluate the machine learning models for grain yield prediction. The high heritability of the VIs indicated an excellent accuracy of these traits for grain yield prediction (Tables A1-A4).
Among these VIs, GNDVI, NDRE, OSAVIREG, MSRREG, and MTCI had a heritability greater than 0.5 under both the treatments and across the growth stages. These five VIs could be selected as the best ones to predict variations among the genotypes for grain yield. Previously, some studies have reported that GNDVI and NDRE were the best predictors for grain yield and nutrient uptake efficiencies across the growth stages [5,64]. Therefore, these VIs can be used to select high-yielding genotypes with high accuracy in large breeding programs. Concerning water stress condition between different irrigation treatments, it has been found that thermal images show a correlation between minor changes in water stress that are undetectable by the multispectral indices as normalized difference vegetation index (NDVI) [65]. Thermal imagery can help diagnosis of water stress in plants, causes by the stomatal closure, which determines the reduction of the transpiration rate and decreasing evaporative cooling increases leaf temperature. In this regard, canopy thermal imaging represents a fast and practical way to evaluate and estimate crop water status, indicating plant's water content. In some cases, relationship of canopy temperature has also been reported high with grain yield at early grain filling stages but lower at later grain fulling stages [66]. This might be due to low greenness and photosynthetic activity at late maturation stages. When wilting starts and transpiration decreases, the temperature difference between plots could get small which weakened the discrimination ability of canopy temperature at plot level growth. Therefore, data fusion of multispectral and canopy temperature imagery for GY prediction at a critical growth stage can increase the accuracy of prediction analysis. Meanwhile, the introduction of machine learning models can further improve the prediction accuracy and decisions during selection. To achieve precise prediction results and reduce the risk of overfitting, machine learning algorithms usually adopted feature selection strategies to reduce the data dimension to a suitable level [34]. In this study, VIs with heritability greater than 0.5 were selected across the growth stages, which means that all the base learners and their combinations were evaluated at maximum input data accuracy and repeatability. Previously, feature selection algorithms such as recursive feature elimination (RFE) [37] and Boruta [67] have been applied to prediction analysis. Different from RFE and Boruta, the feature selection based on heritability can be performed without knowing the predictor variables in breeding work. The successful use of heritability to reduce the number of input features has been reported in the previous study [4], and it achieved high prediction accuracy. Therefore, taking VIs with high heritability as input data increased the prediction accuracy and reliability of the models to assess the variations of genotypes.
Previously, several studies have exploited different machine learning models such as RR, SVM, RF, and GP to predict grain yield and physiological attributes of plants on remote sensing data sets [32][33][34][35][36]. This study aims to evaluate the combination of the above base learners to form an ensemble learning approach for grain yield prediction in different growth stages. It is difficult for a single machine learning model to predict several plant attributes using similar algorithms. For example, random forest algorithms are among the most powerful machine learning algorithms because of their proven accuracy, adaptability, and simplicity [32]. Such algorithms have been used in applications ranging from forest growth monitoring to winter wheat leaf chlorophyll content estimation [32,68]. However, the model performance of random forest is not as good as ridge regression in most cases considered by this study. Different characteristics of trait composition made it difficult for an algorithm of any model to maintain the prediction ability and accuracy in the case of several varied traits as input data [34,37]. The RR model has been reported to have high accuracy and robustness under most of the modeling conditions [69][70][71][72]. In our study, better prediction results were also obtained by the RR model compared to other models ( Figure 6 and Table 5). It might be attributed to the biased nature of the model for collinear data analysis and comprehensive relationship for most of the traits [69,73]. However, the prediction ability of the RR model could be low in some cases, due to traits, growth stages, and growing conditions [74]. Therefore, it is important to find a method that can combine the advantages of multiple models to achieve an improved prediction accuracy under various growth conditions for grain yield prediction.
Ensemble learning approaches have been reported to increase the diversity of algorithms by combining different base learners, and the combination of more heterogeneities of base learners improves the ensemble learning model with higher prediction ability. When implementing a stacking method, it is necessary to include self-sufficient, independent, and diverse base learners for analysis [34,37]. In this study, four machine learning algorithms with different principles and internal structures were successfully combined, which achieved higher accuracy for predicting grain yield than the single base learner. In addition, the accuracy parameters (R 2 and RMSE) of the base learners exhibited more fluctuations with the wide ranges than that of the ensemble approaches (Figures 7-9), indicating the stability of the ensemble approaches on new data.
Among all the combinations with two, three, and four base learners, the one that contained the RR model performed well in terms of prediction accuracy across growth stages and treatments. Meanwhile, the regression coefficient analysis of combining the four base learners showed a higher weight of RR in most of the cases. Therefore, the RR model had a great influence on the ensemble method for grain yield prediction. Combining two or three base learners can contribute to higher prediction accuracy than combining four base learners in some cases, but these model combinations were unstable and performed poorly in other situations. Conversely, the model combination of four base learners exhibited good accuracy and stability in all the cases, which is of great significance to the practical applications of the prediction model. Overall, the stacking regression method can achieve higher prediction accuracy than individual base learners, and the improvement was proportional to the number of base learners ( Figure 10). This is consistent with the conclusion of a previous study for estimating hourly and continuous ground-level PM2.5 concentrations [44]. The stacking regression can be further optimized for grain yield prediction. The results of this study showed that the larger the number of base learners, the higher the accuracy of the final model ( Figure 10). It means that more machine learning algorithms should be incorporated by the stacking regression method. Therefore, deep learning-based regression methods such as multilayer perceptron (MLP) [75] can be used as a new algorithm in the stacking procedure, while multiple birth SVM regression [76] and parallel RF regression [77] can be respectively used as variants of the SVM and RF regression for further improvement of the model. In addition, stacking a large number of base learners requires a level-2 model to perform the multicollinearity data analysis in the model [78]. Thus, ridge regression, least absolute shrinkage, selection operator (LASSO), and elastic net regression (ENET) can be used as level-2 models for collinearity analysis [78][79][80][81]. The grain yield predictions made by single models under full and limited irrigation treatments were similar across the growth stages, indicating the adaptability of UAV data for grain yield prediction. The prediction results obtained from ensemble learning were very similar under both irrigation treatments i.e., full irrigation treatment (R 2 = 0.625) and limited irrigation treatment (R 2 = 0.628) at the mid-grain filling stage. This result indicated that the mid-grain filling stage is the most appropriate stage to predict the grain yield under full and limited irrigation treatments. This is rational since grain filling is the stage where wheat transfers organic matter such as starch and protein produced by photosynthesis from the vegetative organs to the grains, and the vegetation indexes in this period are closely related to the final quality of thousand-grain weight [82].

Conclusions
Recently, there is an increasing focus on using multi-model fusion for data analysis to increase the prediction accuracy in crops. We have successfully used the ensemble learning method to stack the multiple base leaners to increase the grain yield prediction accuracy in wheat. The experimental results showed that the stacking of multi base learners possess the capability to improve the traits estimations effectively as compared to simple regression methods. Since UAV based phenotyping platforms can estimate spectral information from multiple growth stages cost-effectively, the use of ensemble learning approaches is important in increasing the accuracy of within season yield predictions. Our results illustrated the usefulness of ensemble learning approach for yield prediction using multi-stage multispectral data. We also demonstrated the weight of each base leaner such as random forest (RF), support vector machine (SVM), Gaussian process (GP), and ridge regression (RR) in ensemble leaning model development for yield prediction. Grain yield prediction results were high at a mid grain filling stage when a four base learner combination was used in an ensemble learning model as compared to other growth stages and base leaner combinations. To date, relatively few studies have been done to use the information obtained by UAV based sensors as inputs in ensemble learning model prediction of grain yield in winter wheat. Further validations of ensemble learning methods on multiple crops and UAV based data are required to increase its validity and authenticity in crop breeding.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data will be available on demand.

Conflicts of Interest:
There is no conflict of interest regarding this study.