Remotely Sensed Phenotypic Traits for Heritability Estimates and Grain Yield Prediction of Barley Using Multispectral Imaging from UAVs

This study tested the potential of parametric and nonparametric regression modeling utilizing multispectral data from two different unoccupied aerial vehicles (UAVs) as a tool for the prediction of and indirect selection of grain yield (GY) in barley breeding experiments. The coefficient of determination (R2) of the nonparametric models for GY prediction ranged between 0.33 and 0.61 depending on the UAV and flight date, where the highest value was achieved with the DJI Phantom 4 Multispectral (P4M) image from 26 May (milk ripening). The parametric models performed worse than the nonparametric ones for GY prediction. Independent of the retrieval method and UAV, GY retrieval was more accurate in milk ripening than dough ripening. The leaf area index (LAI), fraction of absorbed photosynthetically active radiation (fAPAR), fraction vegetation cover (fCover), and leaf chlorophyll content (LCC) were modeled at milk ripening using nonparametric models with the P4M images. A significant effect of the genotype was found for the estimated biophysical variables, which was referred to as remotely sensed phenotypic traits (RSPTs). Measured GY heritability was lower, with a few exceptions, compared to the RSPTs, indicating that GY was more environmentally influenced than the RSPTs. The moderate to strong genetic correlation of the RSPTs to GY in the present study indicated their potential utility as an indirect selection approach to identify high-yield genotypes of winter barley.


Introduction
The development of new crop varieties, displaying greater yield potential and stress tolerance, is critical for dealing with the expected increase in food demands of the global population [1,2]. Phenotyping is a key methodological approach in the breeding process for the selection of improved varieties [3]. Phenotyping is the process of characterizing the phenotype where phenotype, as defined by Pieruschka and Poorter (2012) [4], is the "combination of all the morphological, physiological, anatomical, chemical, developmental and behavioural characteristics that, when put together, represent the individual organism" (p. 1). In breeding practice, phenotyping includes measurement or visual evaluation by experts of important performance-or quality-related plant traits such as height, biomass or grain yield (GY), phenology, etc. Phenotyping is a time and labor-intensive process when traditional and manual methods are used and it can be a significant challenge in the current breeding programs, where it is not uncommon to have over a thousand breeding lines with replications across multiple environments [5]. Low efficiency in collecting grain yields. The authors mentioned NDVI and digital images as inexpensive methods to derive information for such traits, particularly those related to biomass.
Several studies in recent years have explored the association between spectral data/VIs and GY in barley (Hordeum vulgare L.) in the context of HTP with tractors [29][30][31] and UAVs [9,32,33]. The main questions in these studies concern the relative performance of UAVs versus ground sensors (such as GreenSeeker, Trimble, Westminster, CO, USA), the selection of an appropriate VI showing the highest correlation with GY, and the effect of phenology on this relationship.
The utilization of certain variables derived from remotely sensed data enables the forecasting of crop yields and retrieval of biophysical variables. This is possible because of the underlying mechanisms of how light interacts with leaves and canopy characteristics [34][35][36][37]. Consequently, diverse methods for yield estimation and biophysical variable retrieval are delineated in the literature [38,39]. The most adopted methods are the statistical regression algorithms that are divided into parametric and nonparametric algorithms. While parametric models are easy to calibrate, the nonparametric models are more flexible.
Broad-sense heritability (H 2 ) is an important genetic parameter in crop breeding [40], as it allows plant breeders to estimate the degree to which genetic factors are responsible for variation in a particular trait. This information can then be used to make decisions about which breeding strategies to use to improve the trait [41].
In crop breeding, broad-sense heritability is typically used to estimate the genetic potential of a population for a given trait [42]. For example, if the H 2 of a crop trait is high, it suggests that the trait is strongly influenced by genetic factors, and, therefore, breeding efforts should focus on identifying and selecting the individuals with the best genetic potential for the trait. Conversely, if the H 2 of a trait is low, it suggests that the trait is strongly influenced by environmental factors, and breeding efforts should focus on identifying and selecting individuals with the best environmental adaptation. In remote sensing studies, remotely sensed traits are used to analyze the broad-sense heritability [43][44][45]. Additionally, heritability is used as feature selection technics for grain yield prediction [46,47].
The aim of the study is (1) to demonstrate the potential of using parametric and nonparametric regression models with multispectral data from UAVs as an indirect selection tool for grain yield in barley breeding experiments and (2) to estimate the proportion of phenotypic variability that is due to genetic factors using modeled biophysical variables and ground-measured NDVI.

Site (Field) and Experimental Design of the Study
The study was conducted during the 2020/2021 growing year in the experimental fields of the Institute of Agriculture, Karnobat, Southeastern Bulgaria ( Figure 1). The soil of the experimental fields was slightly acidic (pH is 6.2) Pellic Vertisol.
Despite the total precipitation during the vegetation period (October 2020 to June 2021) being 197.4 mm higher than the long-term average precipitation for the location (424.6 mm), it was unevenly distributed throughout the growing season, as shown in Table 1. The amount of precipitation for January was the highest, 142.8 mm, which was 291% more compared to the long-term values for the location. The precipitation in April was 86 mm, which was nearly double the long-term sum for that month, followed by a significantly drier period in May. Except for March, April, and June, the monthly temperatures were higher than the average long-term temperatures for all other months of the growing season. Overall, the weather conditions enabled the distinguishing of productive abilities of tested barley genotypes. Despite the total precipitation during the vegetation period (October 2020 to June 2021) being 197.4 mm higher than the long-term average precipitation for the location (424.6 mm), it was unevenly distributed throughout the growing season, as shown in Table 1. The amount of precipitation for January was the highest, 142.8 mm, which was 291% more compared to the long-term values for the location. The precipitation in April was 86 mm, which was nearly double the long-term sum for that month, followed by a significantly drier period in May. Except for March, April, and June, the monthly temperatures were higher than the average long-term temperatures for all other months of the growing season. Overall, the weather conditions enabled the distinguishing of productive abilities of tested barley genotypes. Fifty-five genotypes of winter barley varieties and breeding lines were grown in rainfed field conditions. The genotypes were sown in two competitive variety trials (CVTs): CVT 1 and CVT 2. In CVT 1, 4 Bulgarian varieties (Emon, Obzor, Kaskadyor 3, and Dariya) and 21 advanced breeding lines of 2-rowed barley developed at the Institute of Agriculture, Karnobat were grown. CVT 2 included 30 genotypes: 1 Bulgarian 2-rowed variety (Emon), 1 Serbian 2-rowed variety (Sladoran), 4 Turkish 2-rowed varieties (Hasat, Harman, Bolayir, and Burgaz), 15 advanced 2-rowed breeding lines (KT 337, A 9/14, 167Д-2/05, 176Д-1/05, 419Д-2/08, 419Д-5/08, 530Д-2/09, 671Д-3/10, 718Д-4/10, 639Д-3/10, 003Д-3/13, 939Д-4/13, 194Д-1/15, 218Д-1/15, and WS270Д-1/15), 1 Italian 6-rowed variety (Futura), and 8 advanced 6-rowed breeding lines (KT 2207, KT 2213, KT 3040, KT 3041, KT 1706, KT 2199,  SUE I, SUE II). The experiments were organized in a complete block design with 4 replications on plots of 10 m 2 with a sowing rate of 450 germinated seeds per m 2 . All genotypes included in CVT 1 and CVT 2 were sown on 25 October 2020. The standard technology for growing winter barley breeding materials at the Institute of Agriculture, Karnobat was applied. The predecessor was a pea-sunflower mix. One-time nitrogen (N) fertilization with a fertilizer rate of 30 kg/ha of active substance nitrogen in February 2021 was applied. The experiment was treated against weeds with a herbicide combination of Biathlon and Scorpio. No pesticides were used to control diseases or pests, as no pathogens and pests were observed at densities above the economic threshold values during the winter barley growing season.

Measured Grain Yield (MGY)
All plants were harvested from each plot at maturity, threshed, and weighed to calculate the grain yield (kg/ha), Table 2.  (Table 3) with non-invasive methods. The leaf area index (LAI), fraction of absorbed photosynthetically active radiation (fAPAR), and fraction vegetation cover (fCover) were measured with an AccuPAR Ceptometer LP-80 (METER Group Inc., Pullman, WA, USA). Ten measurements evenly distributed over each plot (replication) were made and averaged, providing more representative vegetation canopy information.
Four representative plants were selected in the middle of each plot to measure leaf chlorophyll content (LCC) with a CCM-300 instrument (OPTISCIENCES). Chlorophyll content was measured in the middle of the flag leaf blade of these plants and averaged for each respective plot. The biophysical variables were only measured on the first and the second replication.  [48]. During the calculation of NDVI, the red and NIR spectral bands with a central wavelength of 660 nm (25 nm full width at half maximum (FWHM) and 780 nm (25 nm FWHM) were used.

UAV Data Collection
Two UAV platforms were used to obtain the multispectral data for the studied CVTs. These were a Sensefly eBee AG fixed-wing drone with a multispectral Parrot Sequoia camera (designated as PS in the following) and a DJI Phantom 4 Multispectral quadcopter (designated as P4M in the following). They were equipped with an integrated spectral sunlight sensor, which measured the sky down-welling irradiance and was used to retrieve the reflectance factors [49]. The cameras of both drones were provided with RGB and multispectral sensors. This study uses data from multispectral sensors. Both cameras featured green, red, red-edge, and NIR spectral bands (Table 5), but with different bandwidths. In addition, the P4M camera has a blue spectral band. A total of four flight missions were carried out over the two CVTs (Table 6) during 2021. The flight missions were carried out within the period between 11:00 a.m. and 02:00 p.m. local time while observing the same parameters. The height was 100 m for PS and 50 m for P4M. The spectral images obtained by the PS featured a spatial resolution of 5 cm/pixel and those obtained by P4M were 2.5 cm/pixel. Both UAVs were equipped with a GPS, whereas the DJI Phantom 4 featured built-in real-time kinematic positioning (RTK). This provided a horizontal accuracy of~5 m for the Sensefly eBee AG and 10-15 cm for DJI Phantom 4, respectively. During the flight missions, the development stage was recorded (Table 6) using BBCH identification keys [50].
The small size of the plots (8 m × 1.25 m) requires achieving maximally high accuracy during the orthorectification of the UAV images. For this purpose, prior to the first flight mission, easily discernible objects were fixed, which were used as ground control points (GCPs). They were positioned at the four corners at CVT 1 and CVT 2, they were made of white material, and they constituted a square sized 20 m × 20 cm. They were fixed firmly on the terrain and were used during all flight missions. The geographical coordinates of the GCPs were measured with an accuracy of 1-3 cm by the GNSS equipment Leica GS08 and the RTK regime.

Image Processing and Data Extraction
The raw UAV images were processed using the photogrammetric software Pix4Dmapper (https://pix4d.com, accessed on 27 April 2023) to generate an orthophoto mosaic for every flight mission. The Pix4Dmapper software provides pre-defined camera models for both PS and P4M. It applies radiometric corrections to the images utilizing the spectral sunlight sensor's data, which are recorded within the TIFF file header. The output represents the calculated reflectance and was provided as separate files for each spectral band. ENVI software was used to stack the bands into a multispectral image file; no additional adjustment was needed during the stacking because individual bands were aligned to each other by Pix4Dmapper. All mosaics were in the coordinate system UTM zone 35, datum WGS 1984. The multispectral mosaics were additionally geo-referenced in ArcGIS utilizing GCPs to achieve greater absolute geolocation precision and the correct overlapping of individual mosaics. A polygon layer with the boundaries of the plots was generated in ArcGIS by manually vectorizing them on an RGB image composite (from the first field campaign) used as a reference. The boundary of each plot was then subjected to a 10 cm inward buffering to avoid mixed pixels during data extraction. The buffered boundaries layer was used to extract the per plot-averaged reflectance for each spectral band, sensor, and date. The genotype name, replication number, and code of the plot were included in the attributive table for each polygon. In addition, spectral data were extracted from several bare soil areas, which were manually selected and vectorized in every mosaic.

Modeling and Statistical Analysis
The modeling process and statistical analysis are summarized in Figure 2. band. ENVI software was used to stack the bands into a multispectral image file; no additional adjustment was needed during the stacking because individual bands were aligned to each other by Pix4Dmapper. All mosaics were in the coordinate system UTM zone 35, datum WGS 1984. The multispectral mosaics were additionally geo-referenced in ArcGIS utilizing GCPs to achieve greater absolute geolocation precision and the correct overlapping of individual mosaics. A polygon layer with the boundaries of the plots was generated in ArcGIS by manually vectorizing them on an RGB image composite (from the first field campaign) used as a reference. The boundary of each plot was then subjected to a 10 cm inward buffering to avoid mixed pixels during data extraction. The buffered boundaries layer was used to extract the per plot-averaged reflectance for each spectral band, sensor, and date. The genotype name, replication number, and code of the plot were included in the attributive table for each polygon. In addition, spectral data were extracted from several bare soil areas, which were manually selected and vectorized in every mosaic.

Modeling and Statistical Analysis
The modeling process and statistical analysis are summarized in Figure 2.  The modeling process had two objectives: first, to retrieve GY from UAV multispectral data, and second, to retrieve the LAI, fAPAR, fCover, and LCC for the four replications for heritability analysis.
We trained parametric and nonparametric regression models using in situ and UAV multispectral data to retrieve GY at two development stages, milk ripening and dough ripening. The best GY model was applied to present a high-resolution map of the estimated GY. With the in situ measurements of the LAI, fAPAR, fCover, and LCC from only two replications, we trained nonparametric models for each biophysical variable retrieval. The best model for the LAI, fAPAR, fCover, and LCC was applied to the four replications plot for each genotype and used as remotely sensed phenotypic traits (RSPTs) for heritability analysis in crop selection.
The regression modeling was carried out with the ARTMO toolbox [51,52] (https: //artmotoolbox.com/, accessed on 27 April 2023), version 3.29. Previous studies have shown that incorporating bare soil samples enhances the predictive power of models [53,54]; therefore, bare soil samples were included in the training data. The models were trained with training data, representing 2/3 of all the available data, and were optimized with tenfold cross-validation. The trained models were validated with validation data, representing 1/3 of all the available data, and were not used for training. The best-performing cross-validation models were selected according to the normalized root-mean-square-error (nRMSE). The selected metrics for the validation models are coefficients of determination (R 2 ), root-mean-square-error (RMSE), nRMSE, relative RMSE (rRMSE), and Nash-Sutcliffe efficiency (NSE). The equations for these metrics are described in [55]. The goodness-of-fit metrics for the cross-validation were calculated by the ARTMO toolbox and the validation dataset in Python. Additionally, for GY retrieval, the Akaike information criterion (AIC) was calculated for each validation model as an additional tool for a comparison of the models [56].
Four widely used machine learning regression methods, partial least square regression (PLSR), random forest regression (RFR), kernel ridge regression (KRR), and Gaussian processes regression (GPR), and several generic types of vegetation indices (Table 7) and parametric functions, such as linear, exponential, logarithmic, power, and polynomial, were tested in this study. PLSR [68] is a statistical technique used to identify the linear combinations of predictor variables that are highly correlated with a response variable. PLSR is especially useful in cases of noisy data and is computationally efficient; however, it is sensitive to outliers. RFR [69] is an ensemble machine learning method for regression problems. It combines multiple decision trees to produce a more accurate and stable prediction compared to a single decision tree. RFR can model both linear and non-linear relationships between the predictors and response variable; it is resistant to overfitting and robust to outliers. KRR [70] and GPR [71] are kernel-based methods. The kernel function used in our study was the radial basis function (RBF). The KRR model includes a regularization term (ridge regression) to prevent overfitting and a kernel function to transform the features into a higher-dimensional space. The kernel function maps the original features into a new space where the linear regression model can fit the data better and make more accurate predictions. The advantages of KRR include its ability to handle non-linear relationships between the predictors and response variables, its robustness to noisy data, and its ability to perform well on small datasets. GPR is a Bayesian machine learning technique. It models the target function as a Gaussian process, which is a collection of random variables of any finite number that have a joint Gaussian distribution. The advantages of GPR include its ability to make predictions with uncertainty estimates, its ability to handle non-linear relationships between the predictors and response variables, and its ability to adapt to changes in the underlying function as more data becomes available.
The phenotypic variance, including remotely sensed parameters, MGY, and groundmeasured NDVI, was analyzed using a mixed linear model implemented in the R package "metan" that uses REML/BLUP (restricted maximum likelihood/best linear unbiased prediction) to estimate the variance components.
The function for analyzing single experiments (one-way experiments) using a mixedeffect model based on the following equation was used: where y ij is the value observed for the ith genotype in the jth replicate (i = 1, 2, . . . , g; j = 1, 2, . . . , r); g and r being the number of genotypes and replicates, respectively; α i is the random effect of the ith genotype; τ j is the fixed effect of the jth replicate; and ε ij is the random error associated to y ij . Broad-sense heritability (H 2 ) for each trait was calculated as the proportion of phenotypic variance explained by genetic variance [72].
Genotypic correlation (rG) among trait values was as follows: where COVGxy is the genotypic covariance of trait x and yield and σ 2 Gx and σ 2 Gy are the genotypic variances of trait x and yield, respectively. Response to selection (R) of the RSPTs and grain yield and correlated response (CR) for grain yield by using RSPTs were calculated as: where h x is the square root of heritability and σ x is the genotypic standard deviation: where h x is the square root of heritability of trait x (RSPTs), rg is the genetic correlation between the trait x (RSPTs) and y (MGY), and σ y is the genotypic standard deviation of trait y (MGY) [72]. The relative selection efficiency was calculated as the ratio of CR of grain yield for a specific RSPT and R of MGY [72].

GY Retrieval
We trained models with the data using four replications per genotype from the four UAV missions independently to compare which development stage and camera were better suited for GY retrieval. The models were evaluated according to both cross-validation and validation metrics. The same (random) allocation of replicates among training and validation sets was used for all models.
The best-performing nonparametric model (Table 8 and Figure 3) was GPR with data from the May 2021 campaign and DJI Phantom 4/P4M (Mission2), with an R 2 validation of 0.61, as shown in Figure 4. This model had, with the validation dataset, the lowest RMSE (1034.85 kg/ha) and good rRMSE (16.01%) and NSE (0.60) [55]. It was followed by FRF with data from the May 2021 campaign and eBee Ag/PS (Mission1), with an R 2 validation of 0.45. The models with data from the June 2021 campaign performed poorly with the validation data, with an R 2 validation of 0.33 for eBee Ag/PS (Mission3) and 0.36 for DJI Phantom 4/P4M (Mission4). of 0.61, as shown in Figure 4. This model had, with the validation dataset, the lowest RMSE (1034.85 kg/ha) and good rRMSE (16.01%) and NSE (0.60) [55]. It was followed by FRF with data from the May 2021 campaign and eBee Ag/PS (Mission1), with an R 2 validation of 0.45. The models with data from the June 2021 campaign performed poorly with the validation data, with an R 2 validation of 0.33 for eBee Ag/PS (Mission3) and 0.36 for DJI Phantom 4/P4M (Mission4).   The best-performing parametrical model (Table 9) was with two spectral bands index, EVI2-like with Ra = 730 and Rb = 840, linear function and data from the May 2021 campaign and DJI Phantom 4/P4M (Mission2). The model achieved an R 2 validation of 0.45, which was closely followed by the model with May 2021 campaign with eBee Ag/PS (Mission1), with an R 2 validation of 0.44. However, both models had the same value as AIC, which indicates the ability to fit the data while avoiding overfitting. Therefore, we can consider both models equal in performance for the validation dataset. The best-performing parametrical model (Table 9) was with two spectral bands index, EVI2-like with Ra = 730 and Rb = 840, linear function and data from the May 2021 campaign and DJI Phantom 4/P4M (Mission2). The model achieved an R 2 validation of 0.45, which was closely followed by the model with May 2021 campaign with eBee Ag/PS (Mission1), with an R 2 validation of 0.44. However, both models had the same value as AIC, which indicates the ability to fit the data while avoiding overfitting. Therefore, we can consider both models equal in performance for the validation dataset. The nonparametric models performed better than the parametric models independently of the retrieval method and camera used, the milk ripening development stage, or May campaign, which were better suited for GY retrieval (Tables 7 and 9) than the dough ripening or June campaign. This was clear from the AIC values of the parametric models in Mission1 and Mission2, which were higher than the AIC values of the nonparametric models in Mission3 and Mission4.

Biophysical Variable Retrieval
We trained nonparametric models using data from two replications per genotype and data from a UAV mission that produced the best GY estimation results, specifically milk ripening during Mission2 (DJI Phantom 4/P4M). The models were evaluated using both cross-validation and validation metrics. The best results from the LAI, fAPAR, fCover, and LCC retrieval are presented in Table 10 and Figure 5.
The evaluation of models is complicated task because the variables being analyzed may have differences in magnitude, range, mean, and variance [55]. The LAI retrieval model had the highest performance, with a validation R 2 of 0.71. The validation RMSE of 0.56 m 2 m −2 was also close to the recommended excellent result for the LAI [55], and the rRMSE and NSE values were good. The fAPAR and fCover retrieval models had validations R 2 of 0.52 and 0.48, respectively. The LLC retrieval model, on the other hand, only obtained a validation R 2 of 0.44.
The best model for fAPAR and fCover was GPR, which provides uncertainty estimates. Therefore, an uncertainty estimation was generated by each GPR model. The uncertainty for the validation dataset for fAPAR was under 5% and for fCover, it was under 6%.
The best-performing model per biophysical variable, as shown in Table 10, was then used to retrieve four replications per genotype for the LAI, fAPAR, fCover, and LCC ( Figure 6). The modeled biophysical variables are used as RSPTs for the analysis of broadsense heritability.

Broad-Sense Heritability
A significant effect on the genotype for all evaluated traits according to likelihood ratio validation (Table 11), indicating genetic variability between barley genotypes, was found.

Broad-Sense Heritability
A significant effect on the genotype for all evaluated traits according to likelihood ratio validation (Table 11), indicating genetic variability between barley genotypes, was found. High heritability estimates were found for MGY and RSPTs in CVT2 and pooled data (Table 12). CVT1, MGY, NDVI_1, and NDVI_2 had moderate heritability values, while RSPTs had high heritability. MGY heritability was lower, with a few exceptions compared to RSPTs, indicating that MGY was more environmentally influenced than RSPTs. Considerably higher heritability estimates were found for the LAI, fAPAR, fCover, and LCC than ground-measured NDVI.

Genetic Correlation
The studied RSPTs showed statistically significant genetic correlations with MGY in CVT2 on the basis of all tested genotypes (pooled data) (Table 13). Insignificant correlations of MGY with NDVI_2 and LCC were found in CTV1. The estimated correlation coefficients for CVT2 were higher compared to those of CVT1 and varied from r = 0.459 for the LAI to r = 0.686 for NDVI_1.

Correlated Response of Grain Yield and Efficiency of Indirect Selection
MGY showed the highest response to direct selection (Table 14). Overall, in pooled data, the LAI had the highest correlated response, resulting in the highest indirect selection efficiency for MGY. NDVI_2 gave the lowest correlated response for MGY. Table 14. Selection response (R) for MGY and remotely sensed phenotypic traits, correlated response (CR) for grain yield using remotely sensed phenotypic traits, and relative selection efficiency (CR/R) of the remotely sensed phenotypic traits for grain yield in the pooled data from two CVTs of winter barley genotypes.

Grain Yield and Biophysical Variable Modeling
This study investigated the potential of two consumer-grad multispectral UAV sensors, PS and P4M, for predicting GY and retrieving four biophysical variables (LAI, fAPAR, fCover, and LCC) in winter barley trails. Both sensors have been previously tested for GY assessment in different crops, providing reasonable results [73][74][75][76].
Previous studies observed large fluctuations in the prediction power for remote sensing-based GY models obtained at different time points of the vegetation period. For example, Elsayed et al. [29] utilized PLSR to predict GY achieving R 2 between 0.11 and 0.52. The predictors were a set of spectral indices calculated from hyperspectral sensor data. The different models were calibrated with data collected at growth stages BBCH 55 and 60 and under mild and severe drought stress conditions. In two related studies (using the same hyperspectral sensor and PLSR), Rischbeck et al. [30] achieved an R 2 of up to 0.65 for a model calibrated at growth stage BBCH 59, while Barmeier et al. [31] achieved an R 2 varying between 0.71 and 0.95 for a model at anthesis. A tendency for increasing the accuracy of prediction with the advance of crop development is usually reported in the literature [9,27]. Yield modeling in this study was performed at two relatively late time points, at the milk ripening and dough ripening stages, respectively. We observed that GY was predicted with better accuracy in the milk ripening stage. The dough ripening, on the other hand, appeared to be too late for accurate prediction, probably because spectral differences between genotypes became less pronounced with the senescence. The retrieval of agronomic parameters through remote sensing usually requires high variability in the modeling calibration dataset for optimal results [76]. Moreover, we used data from a single field campaign for the models. Combining data from several time points, or different treatments, during the vegetation period may improve modeling results [75,76].
The nonparametric models performed better than the parametric ones in predicting GY. This result was in agreement with other studies, which have shown the superiority of machine learning algorithms over VI-based regression models for predicting crop traits [77,78].
During the field campaigns, we collected image data from both UAV sensors on the same day and under similar illumination conditions (clear sky) to facilitate their comparison. Based on the results of the present study, we can conclude that the P4M sensor was better suited for assessing GY with nonparametric models (Table 8). However, when using parametric models (Table 9), no conclusion can be drawn about the relative advantages of the sensors in assessing GY. Nevertheless, the most accurate model of GY in this study was based on P4M data. It should be noted that the nonparametric models utilized the full spectral information from a sensor (all bands), whereas the parametric ones usually relied on three-band and two-band VIs, or even single-band reflectance. Therefore, results from the nonparametric modeling may provide a more integrated assessment of the full potential of the sensor. The main differences between the sensors with respect to spectral bands consist of the lack of a blue band in PS and the different width and central wavelength of the red-edge and near-infrared bands ( Table 5).
The retrieval accuracy varied considerably between the biophysical variables. The prediction was the most successful for the LAI, while LCC was the most challenging to estimate using UAV data. The trial in this study was conducted in rainfed conditions and there were no treatments of varying agronomic management (fertilization, irrigation, etc.). The lack of variation in growing conditions was expected to result in lower variation in GY and biophysical variables among genotypes and, therefore, lower correlations, as discussed earlier. In particular, the lower coefficients of determination for fAPAR, fCover, and LCC models, compared to the LAI, could be partially attributed to the low variability of these variables ( Table 3). The LAI was modeled with good accuracy in this study (R 2 = 0.71; nRMSE = 13%). Moreover, the reported accuracy was in the validation dataset and was not included in the training, whereas in other studies [79], the reported accuracy was only in the cross-validation dataset. Furthermore, the accuracy could be improved by adding canopy surface height [80,81] and texture variables [80] as predictors.

Broad-Sense Heritability and the Application of Remotely Sensed Traits for Breeders
The genetic correlation describes the genetic relationship between two traits and provides a measure of the rate at which traits respond to indirect selection. The presence of a sufficiently high level of genetic diversity is a necessary condition for an accurate estimation of this association. Accordingly, the included CVTs in our experiment represented the winter barley varieties with different origins and the most advanced breeding lines from the breeding program at the Institute of Agriculture, Karnobat. Furthermore, a significant genotypic effect was found for all studied RSPTs and measured grain yield. The genetic correlations between grain yield and most of the RSPTs were consistently significant across two sets of tested genotypes, demonstrating the reliability of these estimates.
The moderate to strong genetic correlation of RSPTs to grain yield in the present study indicated the potential use of RSPTs as an indirect selection approach to identify high-yield genotypes of winter barley. The improvement of grain yield is a primary goal for most barley breeding programs, but the complex nature of this trait complicates breeding and the selection of lines with high yield potential. The effectiveness of indirect selection depends on using secondary traits that have a genetic correlation with grain yield and higher heritability than yield. Spectral reflectance indices have been utilized as such traits for the indirect selection of wheat lines with high yield potential [8,43,44,[82][83][84].
The studied RSPTs had generally higher heritability than grain yield. Despite that, all RSPTs had lower efficiency of indirect over the direct selection for grain yield. Gizaw et al., 2016 [44] point out that spectral reflectance can be used in plant breeding, not just as a standalone indirect selection criterion, but also as a component of an integrated selection approach. According to these authors, integrated selection of grain yield and spectral reflectance indices increased the repeatability of genotypic performance across multiple trials.
Moreover, in addition to genetic gain, factors such as phenotyping speed and cost are important for selection efficiency [85]. The remotely sensed phenotyping allows screening of a larger set of germplasm than yield-based selection. Manneveux and Ribaut 2006 [85] proposed using lower selection intensity for selection based on spectral reflectance indices so that most superior genotypes could be advanced through yield-based selection, which increased the indirect selection efficiency substantially.

Limitations, Challenges, and Future Opportunities
This research demonstrated a possible use of remote sensing techniques and data in plant breeding experiments through the successful use of both parametric and nonparametric regression models to retrieve yield and remotely sensed phenotypic traits in winter barley. However, there are some limitations and challenges that are outlined below, along with suggested solutions.
(1) Remotely sensed phenotypic traits are part of the next-generation physiological breeding research [86]. However, the appropriate range of goodness-of-fit metrics for biophysical variables models to be utilized as RSPTs is yet to be established. Additionally, conducting an uncertainty analysis on the RSPTs used as input data in plant breeding analysis is a critical aspect that should be routinely implemented. Propagating uncertainty from the acquisition of in situ and remotely sensed data to the modeling stage will enable plant breeders to gain a comprehensive understanding of the precision and accuracy of the proposed RSPTs. (2) In our current study, we analyzed UAV imagery with four and five spectral bands.
Our findings indicate that the five-band imagery resulted in better models, suggesting that further testing of multispectral and hyperspectral UAV imagery is needed to obtain the optimal number and type of bands that may yield improved retrieval models [87].
(3) The current models have been trained and evaluated using data from one growing year, specifically from the breeding lines of the Institute of Agriculture, Karnobat breeding program. While they have shown promising results, the robustness of these models throughout multiple growing seasons still needs to be evaluated. (4) The selection environment is crucial to assessing the genetic gain and efficiency of alternative selection approaches [85]. Therefore, for the complete assessment of the efficiency of RSPTs for the indirect selection of high-yielding barley lines, additional studies under different environments (growing years and locations) are needed.

Conclusions
The primary goal of a breeding program is to develop cultivars with high yields and superior quality that can be released to farmers. To achieve this objective, it is essential (1) to estimate GY as soon as possible during the growing season and (2) to identify remotely sensed traits that could have a role in the broad-sense heritability analysis. Data from DJI Phantom 4 Multispectral at the milk ripening stage predicted barley GY most accurately. Biophysical variables, the LAI, fAPAR, fCover, and LCC have been retrieved with data from DJI Phantom 4 Multispectral at the milk ripening stage and are used as RSPTs for broad band heritability. They had generally higher heritability than grain yield but lower efficiency over the direct selection for grain yield. However, the acceptable range of goodness-of-fit metrics for biophysical variables models to be used as RSPTs is still to be determined.
We are of the opinion that these initial findings have the potential to accelerate crop improvement programs. However, further robust interdisciplinary research [88] is still required to strengthen these results.