Dynamics and genetic regulation of leaf nutrient concentration in barley based on hyperspectral imaging and machine learning

Partial least squares regression


Introduction
The proliferation of global human population urges for an adequate boost of crop productivity [1]. During the last decades, crop breeding was very successful in increasing grain yield in cereals, which are the main source of calories for human diet [2]. However, this process led to a 'dilution effect', meaning the impoverishment of the protein and mineral concentration in the grain due to the allocation of a fixed amount of nutrients into an increased number of grains [3,4].
The reduction of grain quality and nutritional value has important consequences, as roughly one billion people suffer from low intake of proteins and mineral nutrients [5][6][7]. Moreover, nutrient remobilization from plant vegetative tissues into grains is a limiting factor for grain development, and therefore critical for increasing grain yield [8].
Biofortification is the enrichment of nutrients in crop plants. It can be reached by increasing fertilizer application (agronomic biofortification). However, this solution has only short-term effects [9]. In contrast, genetic biofortification, which is the biofortification obtained through Abbreviations: AI, artificial intelligence; ANOVA, analysis of variance; BLUEs, best linear unbiased estimators; CV, cross-validation; DR, detection rate; GWAS, genome-wide association study; HEB, Halle Exotic Barley; HIS, hyperspectral imaging; HR-ICP-MS, high resolution inductively coupled plasma mass spectrometry; MLP, multilayer perceptron; MRE, mean relative error; MTA, marker-trait associations; NAM, nested association mapping; PLS, partial least squares regression; QTL, quantitative trait locus; r, Pearson's correlation coefficient; R 2 , coefficient of determination; RBF, radial basis function network; RI, remobilization index; RP, relative performance; RPD, residual prediction deviation; SNV, standard normal variate. plant breeding or genetic engineering, constitutes a sustainable and economically convenient prospect to increase nutrient concentration in harvested plant organs [10]. Biofortification of elite crop material can, thus, contribute in achieving a balanced diet for humans and livestock [6].
Mineral elements are taken up from the soil, then mostly transferred to leaves and subsequently remobilized from leaves into developing grains through xylem transportation and, finally, phloem loading and un-loading. Nutrient concentration in leaves is therefore a crucial feature for the final grain nutritional value [11]. Senescent leaves, for instance, play an important role as a source of amino acids and nutrients after protein and chlorophyll degradation [12]. During leaf senescence, proteins, predominantly ribulose-1,5-bisphosphate carboxylase⁄oxygenase (Rubisco), and nutrients are degraded, translocated to younger leaves and, finally, to developing grains. Small-grain cereals like barley, wheat and rice remobilize up to 90 % of the nitrogen from leaves to grains [13]. The major phloem-exported amino acid in barley and wheat is glutamate [14]. Glutamine synthetase (GS) is, thus, of major importance for the re-assimilation of ammonium into exported amino acids during senescence [15]. A delayed leaf senescence, or 'stay green' phenotype, has the potential for higher yield due to a longer period of active photosynthesis [16]. As a consequence, stem and leaf tissue should be photosynthetically active as long as possible, but should also be able to quickly respond to abiotic stresses, such as heat or drought, and remobilize proteins and nutrients with high efficiency to grains [13]. In barley leaves the number of chloroplasts per mesophyll cell are largely constant until late senescence, while the chlorophyll and protein concentration is reduced early on [17].
The barley NAC transcription factor HvNAM-1 is a well-described gene involved in senescence and in rapid chlorophyll degradation, and remobilization of nitrogen, iron and zinc to the grain [18]. In addition, other members of the NAC transcription family are up-regulated during leaf senescence, where, for instance, the barley gene HvNAC013 directly interacts with the radical-induced cell death 1 (RCD1) gene [19]. The observation that HvNAM-1 is associated with the acceleration of senescence indicates the involvement of other developmental genes in nutrient partitioning among plant organs [20]. Possible candidates comprise genes encoding transporters involved in phloem loading at source leaves and phloem unloading at sink. For instance, the enhanced expression of the OsNRT2.1 gene, which encodes a high-affinity nitrate transporter, using a nitrate inducible promoter, increased nitrogen-use efficiency in rice [21]. A similar effect was observed in maize (Zea mays) by the targeted expression of the ZmNRT1.1A and ZmNRT.1B transporters [22]. These examples indicate the potential of genetic biofortification through the improvement of nutrient remobilization from leaves to grains.
Besides genetic variability, a major limiting factor for biofortification is plant phenotyping. Cost-efficient high-throughput phenotyping is necessary for exploiting genetic resources. High-throughput hyperspectral imaging (HSI) in combination to machine learning methods can be applied to predict mineral concentrations in plant tissues [37,38]. A hyperspectral image consists of a two-dimensional image of a tissue, extended by multiple additional dimensions, the spectral reflectance data. Both are obtained by hyperspectral camera systems, usually operating at wavelengths between 400 and 2500 nm, to create a multi-dimensional data cube [39][40][41][42]. HSI proved to be an efficient way for predicting grain nutrients and allowed performing a genome-wide association study (GWAS) to identify quantitative trait loci (QTL) controlling grain nutrient concentrations in HEB-25 [43]. In addition, monitoring hyperspectral canopy reflectance performed close to visual scoring in assessing senescence dynamics in wheat, aiming at predicting grain yield and grain protein concentration [44].
Here, we evaluated the potential of non-invasive high-throughput HSI in predicting the concentration of 15 mineral elements (C, B, Ca, Cd, Cu, Fe, K, Mg, Mn, Mo, N, Na, P, S, Zn) in leaves of the HEB-25 population, sampled during plant development in the field. Values for R 2 , MRE and heritability of the predicted leaf nutrient concentrations were estimated to characterize prediction quality. We developed several derived traits on the basis of predictions, describing the kinetics of leaf nutrient remobilization, and correlated them to plant developmental and yield component traits to elucidate their interdependence.

Field trials and trait scoring
The HEB-25 population and controls were grown in Halle (Germany; 51 • 29'46.24" N, 11 • 59'27.70" E) during two years (2016 and 2017), following an incomplete randomized block design. In both years, HEB-25 lines and control genotypes were sown in a total of 1,437 plots per treatment, i.e. 5,748 field plots across two years and two treatments per year. Plot size was 0.3 m 2 , made of a double row with 30 seeds per single row, a spacing of 0.20 m between rows and a plot length of 1.50 m. In 2016, plants were grown under two nitrogen (N) fertilization treatments (N0 = without fertilizer; N1 = with fertilizer). In N0, the concentration of plant available nitrogen in soil (N min ) was 30 kg/ ha (based on extracting ammonium and nitrate N with calcium chloride from a 90 cm soil sample, taken in February of each trial year). After measuring N min , the N1 block was fertilized with 60 kg/ha of calcium ammonium nitrate (ENTEC26, Eurochem Agro) to achieve a targeted available nitrogen level of 90 kg N/ha. Fertilization was carried out two weeks after sowing. In blocks N0 and N1, the herbicide Biathlon 4D® was applied at shooting stage, the herbicides DUANTI® and Axial 50® as well as the fungicide Adexar® and the insecticide Sumicidin Alpha EC® were applied during grain filling stage. All treatments followed local practice.
In 2017, N min was 70 kg/ha before fertilization. Due to this very high N min content no significant difference in soil N min could be realized since spring barley is supposed to grow with an available nitrogen level between 60 and 120 kg N/ha. The experimental setup for 2017 was, thus, changed in such a way that field blocks were not differentiated based on nitrogen fertilization. Instead, the two blocks were distinguished based on the absence (Fun0) and presence (Fun1) of fungicide treatment. In block FUN0, the herbicide Biathlon 4D® was applied at shooting stage. The herbicide Axial Komplett® and the insecticide Sumicidin Alpha EC® were applied during flowering time. In addition, the herbicide DUANTI® was applied during grain filling stage. In block FUN1, the herbicide Biathlon 4D® was applied at shooting stage. The fungicide Vegas® Proline, the herbicide Axial Komplett® and the insecticide Sumicidin Alpha EC® were applied during flowering time. In addition, the fungicide BONTIMA® and the herbicide DUANTI® were applied during grain filling stage. All treatments followed local practice. The fungicides were selected based on their effectiveness against the barley foliar diseases powdery mildew (Blumeria graminis), leaf blotch (Rhynchosporium secalis), leaf rust (Puccinia hordei) and net blotch (Pyrenophora teres). Moreover, these fungicides have no known stay-green effect, as it is the case for other commonly used fungicides. No additional nitrogen fertilizer was applied to blocks Fun0 and Fun1. Growth regulators were not used. In 2016, sowing took place on March 14 and 15 and harvesting by hand started on July 04, lasting 2 weeks. In 2017, sowing took place on March 27 and 28 and harvesting by hand started on July 18, lasting 2 weeks. Field management was in accordance with local practice.
Plant development and yield formation of the studied lines were characterized by eight trait parameters ( Table 1) calculated from weighing the MARVIN optical seed analyser grain sample (explained under GEA) and extrapolating to the weight of 1000 grains; grain yield (YLD): total grain yield in dt/ha, determined after hand harvesting grains from the full plot.

Nutrient analysis via wet-lab chemistry
In this study, 15 traits of nutrient concentration in leaves were investigated, including the concentration of macro elements nitrogen (N), phosphorus (P), potassium (K), carbon (C), calcium (Ca), magnesium (Mg), sulfur (S), and micro nutrients boron (B), copper (Cu), iron (Fe), manganese (Mn), molybdenum (Mo), sodium (Na), nickel (Ni) and zinc (Zn). Leaf samples at four developmental stages were collected. The developmental stages corresponded to plant shooting (stage A, BBCH 31-39), heading (stage B, BBCH 49-59), grain filling (stage C, BBCH 71-77) and maturity (stage D, BBCH 89-92, Table 1). At stage A the youngest leaves and at stages B to D flag leaves were sampled from five representative plants per field plot. Leaf samples were immediately frozen in liquid nitrogen and subsequently stored at − 20 • C until biochemical analysis. For nutrient analysis, the dried material obtained from frozen leaves was placed in 20 mL vials (HDPE, Zinsser Analytic GmbH, Frankfurt am Main, Germany) and homogenized with two stainless steel balls (8 mm diameter, Mühlmeier GmbH & Co. KG, Bärnau, Germany) for one min. at a frequency of 30 Hz using a Retsch mixer mill MM 400 (Retsch GmbH, Haan, Germany). For C and N, 1.5 mg of homogenized material were analyzed by a Euro EA300 (Euro-Vector, Pavia, Italy), based on the principle of Dumas [46], using the software Callidus version 5.1. For calibration, the standard 2,5-Bis-(5-tert-butyl-2-bezo-oxazol-2-yl)thiophen with 72.52 % C and 6.51 % N was used (HEKAtech GmbH, Wegberg, Germany). For the analysis of B, Ca, Cu, Fe, K, Mg, Mn, Mo, Na, Ni, P, S and Zn, the homogenized material (10 mg) was weighted into PTFE digestion tubes and added with Nitric acid (1.0 mL; 67-69 %, Bernd Kraft, Duisburg, Germany). After a 4-h incubation, the samples were digested under pressure using a high-performance microwave reactor (UltraClave 4; MLS, Leutkirch, Germany). The digested samples were transferred to Greiner centrifuge tubes and diluted with de-ionized (Milli-Q®) water to a final volume of 8.0 mL. The elemental analysis was carried out by High Resolution Inductively Coupled Plasma Mass Spectrometry (HR-ICP-MS; ELEMENT 2, Thermo Fisher Scientific, Dreieich, Germany) with Software version 3.1.2.242, using a SC-2 DX Autosampler (ESI, Elemental Scientific, Mainz, Germany). A six-point external calibration curve was set from a certified multiple standard solution (Bernd Kraft, Duisburg, Germany). The elements Rodium (Rh) and Germanium (Ge) were infused online and used as internal standards for matrix correction. The measuring unit of leaf nutrient concentration is micrograms per gram of dry weight, except for N and C, whose concentration is expressed in percentage (weight/leaf dry weight).

Field hyperspectral imaging
Hyperspectral imaging in the field was performed using the field sensor platform AgRover, developed by Fraunhofer IFF as an instance of its HawkSpex® Scan technology (Fig. 1). Per developmental stage hyperspectral reflectance data were collected within two days from all field plots together with wet-lab sampling of leaves from approximately 15 % of randomly selected plots per stage. The wet-lab samples were used for subsequent model generation (see below). The AgRover is a novel multimodal field sensor system for high-throughput phenotyping of plant development in field trials. The sensor system collects hyperspectral, RGB and depth data at a measuring distance of 1 m above the barley plants. Leaf reflectance spectra were acquired by a hyperspectral camera (HySpex SWIR 384, Norsk Elektro Optikk, Norway), with 288 channels in the range of 1,000 to 2,500 nm. The camera system can be flexibly positioned over the plots to be measured via an extension arm. Absolute GPS-position of the measurement system was provided by a Table 1 List of plant phenotypic traits. Nutrient concentrations are given in ppm (μg/g dry weight) for all nutrients except for N and C. The latter are given in % (g/100 g dry weight, DW). BBCH is a scale for plant developmental stages [66]. . For the establishment of defined lighting conditions, the camera system is encapsulated in a shading box to protect it from external ambient light. Inside the measuring box, a halogen light source ensures homogeneous illumination of the measuring field. Two 300 W shortwave spotlights with a broad power spectral density were installed (Hedler C12, Hedler Systemlight, Runkel/Lahn, Germany). A PTFE Spectralon (54.5 × 35 cm) was used as a calibration standard covered with a borosilicate glass for protection.

Prediction of nutrient concentration via hyperspectral imaging, data processing and machine learning
Hyperspectral image cubes were processed by the automated workflow system AutoML platform HawkSpex® Flow developed by the Fraunhofer IFF. In order to obtain reflectance values, each measurement contains a calibrated white target with known reflectance, which was automatically marked and extracted. Reflectance calculation was performed using where I λ is the image pixel intensity at wavelength λ, I DC λ the intensity when measured with closed shutter ("dark current") and I W λ being the intensity while recording a white calibration target.
For modeling, the pixels that represent the leaf material were extracted from the reflectance hyperspectral images by a multistage segmentation process. In the first step, plant material was isolated from the general background (soil, white target, etc.). In the second step, the plant material was segmented into barley and weeds. In the last segmentation step, the plant material of barley was reduced to the leaves. Stems and, in later stages of development, pronounced ears were removed. For segmentation, a machine learning model based on the reflectance image was trained to classify each spectral pixel. Different configurations of Multilayer Perceptron (MLP) models and Radial Basis Function Network (RBF) were used (see Table S1). The labeling of the spectral images was carried out with the help of a combination of unsupervised clustering algorithms and manually segmentation.
For each plot a mean spectrum over 1,000 spectral samples was calculated and was assigned to its corresponding nutrient concentrations value, determined by wet chemistry. This procedure generated the sample-target assignment needed for machine learning.
In this study, several regression machine learning models were tested to predict the leaf nutrient concentration in barley (see Table S1). Due to computational demand, only 30,000 spectra per dataset were used in the modeling. The 'leave-one-out' validation scheme was applied. The coefficient of determination (R 2 ) was used as a performance criterion for the prediction. R 2 was defined as the squared Pearson correlation coefficient: where p i is the nutrient concentration prediction for sample i, while o i is the ground truth target (observed) nutrient value with p and o being their respective averages, and σ p and σ o being their respective standard deviations. In order to better evaluate the prediction performance, other parameters were also used as a further valuable quality measure: the mean relative error (MRE) between target and prediction over all observations, and the residual prediction deviation (RPD). Furthermore, a remobilization index (RI) was developed for this study in order to evaluate the impact of nutrient concentration variation on R 2 , MRE and RPD. MRE, RPD and RI were calculated according to the following formulas: The AgRover is a novel multimodal field sensor system for high-throughput phenotyping in field trials. The sensor system collects hyperspectral, RGB and depth data at a measuring distance of appr. 1 m above plants. C) Leaf reflectance spectra are collected through a hyperspectral camera (HySpex SWIR 384, Norsk Elektro Optikk, Norway) with 288 wavelength channels in the range of 1,000 to 2,500 nm. D) Workflow of leaf nutrient prediction. Plants were cultivated in 2,874 plots p.a. Hyperspectral imaging data with AgRover and, in parallel, leaf samples from appr. 15% of plots were collected at four developmental stages. Modeling of leaf nutrient concentration was based on partial least squares regression (PLS), radial basis function network (RBF) and multilayer perceptron (MLP).
where: N is the number of samples; o, observed value; p, predicted value; SD, standard deviation of observed values; SEP, standard error of prediction; mA, mB, mC, mD are the median values of nutrient concentration at developmental stages A, B, C and D, respectively. Reflectance spectra from leaves in close-range imaging are highly influenced by plant geometry and its specific alignment towards the imaging system. This induces high uninformative variability in the recorded signals, whereas the spectral signature informing on plant biological traits remains undisclosed [47]. To reduce this uninformative variability, two normalization methods were applied to the spectra, namely the Standard Normal Variate (SNV) and the L2 normalization, according to the formulas: where R λ is a row vector containing the original spectrum and N the number of the observed wavelength (λ) [47].
In the comparison of the vector normalization methods, the vector L2 normalization showed the best prediction results and was used in our analyses.
For the prediction of leaf nutrient concentration, for each nutrient a one-year and a two-year model were trained. The model validation showed that the Partial least squares regression (PLS) model performed best in all cases, except for the predictions for Cu, S, Zn in 2016 and the two-year model for K, which are based on MLP models. For each nutrient prediction the best evaluated model was used. Model training was performed using the AutoML platform HawkSpex® Flow developed by the Fraunhofer IFF. Table S2 indicates the number and percentage of wetlab and predicted field plots generated per year and developmental stage.

Measuring phenotypic traits across four developmental stages
For a deeper analysis of the processes involved in nutrient remobilization from leaf, the values of nutrient concentration at different stages were used to calculate three kinetic parameters per nutrient describing the change of leaf nutrient concentration during plant development (here denoted as 'kinetic traits'; see Table 1). Kinetic traits were obtained for every field plot by calculating the integral of the nutrient concentration curve delimited by stages B and C (BC.a), C and D (CD.a) or B and D (BCD.a). Stage A values were excluded from calculation, because the dataset at this stage was unbalanced between the two experimental years (in 2016 only wet-lab data were available, as predictions were not possible due to low spectra quality).

Statistical analyses
Best linear unbiased estimators (BLUEs) of plant development traits, yield components, stage and kinetic traits were estimated across years by applying a mixed linear model in SAS 9.4 (SAS Institute Inc., Cary, NC, USA, PROC MIXED), with fixed effects for genotype (i.e. 1,420 HEB lines), random effects for year (i.e. 2016 and 2017) and random interaction effects of genotype x year [36]. To estimate variance components, all effects were assumed to be random (PROC VARCOMP). Broad-sense heritabilities across years were estimated based on variance components as: where V G , V GY and V R represent the variance components genotype, genotype × year, and error, respectively. The terms y and r indicate the number of years and replicates, respectively. Pearson's correlation coefficients (r) based on BLUEs were calculated with PROC CORR.
Although for leaf nutrient concentration only positive values can be expected, calculation of BLUEs lead to negative values in some cases where input values were close to zero. Negative values were converted to zero. This occurred in 3.7% of all calculations (one for B, one for Fe, 18 for Na, 457 for Ni). Analysis of Variance (ANOVA) was performed in R language [48] via the Anova() function from car package.

Genome-wide association study (GWAS)
A genome-wide association study was conducted for stage and kinetic trait BLUEs across both experimental years, based on 50k Illumina Infinium iSelect SNP Array data [49]. For this, a multiple linear regression model was applied using SAS 9.4 Software (SAS Institute Inc., Cary, NC, USA) and Proc GLMSELECT [50]. The procedure selects the best model by including and excluding SNPs as co-factors through stepwise forward and backward regression. SNPs were allowed to enter or leave the model at each step until the Schwarz Bayesian criterion could not be reduced further. SNPs included in the final model are hereafter referred to as significant SNPs. The SNP effect estimate can be interpreted as the allele substitution effect (α) and represents the regression coefficient of the respective SNP in the final model. Based on the final model, a fivefold cross-validation (CV) was run 20 times to increase the robustness of the model selection. For each of the 100 CV runs, the model was trained on 80 % of genotypes (randomly selected) and validated on the remaining 20 % of genotypes. The mean squared Pearson product-moment correlation between observed and predicted phenotypes (R 2 ) in the training set was used to estimate the explained phenotypic variance in the GWAS model (R 2 train ), while the R 2 of the validation set was used as an estimate of the model prediction ability (R 2 val ). The statistical significance of marker-trait associations (MTA) was determined applying the Bonferroni-Holm method [51]. Significant MTA were accepted with P BON-HOLM <0.05. For every marker, the number of significant MTA counted across all CV runs was defined as detection rate (DR), and used to evaluate the robustness of the marker-trait association. For every trait, significant markers (at DR ≥ 50) located within an interval ≤10 cM were combined to one QTL. The effect of wild alleles in comparison to domesticated alleles was reported as relative value (in percentage) and denoted as relative performance (RP).

Kinetics of leaf nutrient concentration
Leaf concentrations of 15 nutrients were determined for 1,420 HEB lines, four introgression lines and 13 control cultivars. In total, nutrient concentrations from 2,591 HEB-25 plots, across two years (2017 and 2018), two treatments per year(N0 and N1 in 2016, respectively, Fun0 and Fun1 in 2017) and four stages per treatment (A to D, i.e. shooting, heading, grain filling and maturity), were measured by wet-lab chemistry with an average of 161.9 wet-lab plots per year x treatment x stage combination (Tables S2 and S3). Wet-lab data were acquired for all nutrients at every stage, except for N, whose concentration could not be determined at stages C and D in 2017 for technical reasons (Table S3). In addition, hyperspectral imaging data were collected for all field plots during the developmental stages plant shooting (A), heading (B), grain filling (C) and maturity (D), in parallel to collecting wet-lab samples (Table 1). Only at stage A in 2016, hyperspectral image data could not be collected due to technical failures of the hyperspectral camera system caused by improper set up of the camera or scatter sunlight on the image. In order to achieve a nearly full data set of leaf nutrient concentrations for all field plots, wet-lab data and hyperspectral imaging data were, subsequently, used to successfully model nutrient concentrations by applying machine learning methods for 14,653 HEB-25 plots (for details see chapter 3.2 and Tables S2 and S4). To our knowledge this experiment is among the largest field trials to model leaf nutrient concentrations based on hyperspectral imaging and machine learning. Grzybowski et al. concluded that the adoption of hyperspectral imagingbased phenotyping in quantitative genetics of crops may accelerate the study of genes controlling natural variation in biochemical and physiological traits [52]. In this regard, researchers can apply a wide range of imaging platforms [53][54][55] and machine learning routines [56,57], including rover-based hyperspectral imaging and PLS models as used in our study.
As a first level of validation, we verified if the predicted nutrient values of field plots achieved realistic values compared to wet-lab data.
The range of predicted values was similar to the range of wet-lab data for every combination of treatment x stage for all nutrients, except N and C (Fig. S1A). At stages C and D in 2017, the predicted N concentration values barely showed the decrease along stages that we would expect from N remobilization from leaves to grains. Also, the predicted values of C concentration at stage D were far higher than the range in the wetlab C data. Most likely, these discrepancies were caused by the absence of wet-lab data for C and N concentrations at stages C and D in 2017. Applying a two-year model based on the combined datasets of 2016 and 2017, resulted in predicted N and C concentrations for 2017, which were much closer to the predictions in 2016 (Fig. S1B). For this reason, wetlab and predicted values of leaf nutrient concentration were combined across treatment x stage combinations in one-year models, except for N and C at stages C and D in 2017, for which the results from a two-year model was used.
For all nutrients, the concentration levels were significantly different among stages (Tukey test, confidence level of 95%), except for Ni, whose concentrations did not differ among stages A, B and C ( Fig. 2A). Although the dataset for stage A is incomplete, it is included in Fig. 2 in order to delineate patterns of nutrient remobilization. By focusing mainly on the large variations among stages, it was possible to distinguish six different patterns of nutrient remobilization (Fig. 2B). N, P and K leaf concentrations gradually decreased from stages A to D, indicating that the remobilization of these elements was initiated already before heading time. Ca and B concentrations increased from stages A to C, and then dramatically decreased from stages C to D. Mg, S, Cu and Zn concentrations were quite stable from stages A to C, and then decreased from stages C to D. In contrast, Fe, Mn, Mo and Ni values were stable from stages A to C and increased from stages C to D. The concentrations of C remained quite stable along the whole plant development. The pattern of Na was particular, as its leaf concentration decreased mainly from stages A to B, being quite stable from stages B to D.
Reports on leaf nutrient remobilization during plant development is sparse regarding the investigated nutrients, plant species and growth conditions. Nevertheless, it seems clear that several plant species differ substantially for their patterns of leaf nutrient remobilization. For example, Siedliska et al. reported on a hyperspectral imaging trial with three plant species to predict phosphorus concentration in leaves over five stages of plant development [58]. Applying machine learning algorithms they could show that the lowest prediction accuracy was obtained for the earliest measured stage of plant development. They found varying correlations between leaf nutrient concentrations among the studied species. Total chlorophyll, for instance, showed high correlations (r>0.7) with Mg, K and N in sugar beet but not in celery or strawberry. Maillard and co-authors carried out a comparison of leaf remobilization of macro nutrients and micro nutrients among eight plant species, including barley and wheat, under field conditions [59]. In this study, the nutrient concentration of flag leaf was analyzed from about 40 % until 100 % of leaf life span. Taking into account that flag leaves are already unrolled at BBCH 39, the curve of nutrient concentration reported in the study covers a plant life period spanning approximately from our stages B to D. The gradual decrease of the macro nutrients N, P and K in leaves (Fig. 2) is in accordance with Maillard et al. [59] who reported that more than 80 % of the apparent N, P and K were remobilized. Also, the decline of Ca, Mg, S, and, to a lesser degree, of Zn and Cu from grain filling to maturity (stages C to D) and the lack of remobilization of Fe, Mo and Ni in our study confirm their findings. It was shown that Fe remobilization in senescing barley leaves may be stimulated by N deficiency, due to an increased phytosiderophore synthesis [60]. Others reported that remobilization of Zn from leaves to grains is substantial in barley [61]. The only remobilization not confirmed by Maillard et al. [59] is Mn, which was reported to decrease during leaf senescence, as opposite to our data. Considering that the results of Maillard et al. are exclusively based on biochemical analysis, the similarity of nutrient remobilization patterns between the two datasets supports the reliability of our model predictions based on field hyperspectral imaging.
The relative contribution of stages (A to D), experimental year, treatments (N0, N1, Fun0, and Fun1), fungicide application (N0, N1 andFun1) and soil N level (N0 vs remaining values) to the total variation of leaf nutrient concentration was evaluated by eta squared statistic, based on ANOVA analysis (Tables S5-S8). Based on one-way ANOVA, at average the stages accounted for 57.6 % of nutrient concentration variation, while treatment, year, soil N level and fungicide only for 13.1, 11.3, 6.9 and 2.5 %, respectively (Tables S5 and S6). The interaction effect of stages with treatment and year was analyzed by two-way ANOVA and resulted to be slightly higher than 10 %, both for treatment and year factors (Tables S7 and S8). From both analyses, it is clear that stages accounted for most of the total variation, whereas different levels of soil N amount and the use of fungicide had only a minimal effect on leaf nutrient concentration.
The best linear unbiased estimator (BLUE) values of all investigated traits were determined along the four treatments for every barley line, based on the whole dataset (wet-lab and predicted data). Table 2 shows the descriptive statistics of BLUEs, coefficient of variation (CV) and heritability (h 2 ) of stage and kinetic traits. Kinetic traits showed consistently higher heritabilities than stage traits ( Table 2). In particular, h 2 for BC.a reached at least values of 0.8 for all nutrients except K, B, Fe, Mo, Na and Ni, thus indicating that leaf nutrient remobilization from heading until grain filling is highly heritable. Thus, the mere value of nutrient concentration is not optimal for genetic analyses, as kinetic traits have the advantage of higher heritability. For all nutrients, except Table 2 Descriptive statistics of stage and kinetic traits for macro nutrients and micro nutrients including number of observations (N), BLUEs (Mean), standard deviation (SD), minimum (Min), maximum (Max), coefficient of variation (CV) and heritability (h 2 ). Traits are explained in Table 1.
Ca and Ni, the trait exhibiting the largest variability was D.c, i.e. the leaf nutrient concentration at maturity stage. In perspective of breeding applications, high values of both phenotypic variability and heritability are important for the potential use of a specific trait, although they are often inversely proportional, similarly to many cases shown in Table 2. At this regard, it is worth noticing that the D.c trait of N, P, B, Fe, Mn, Mo, Ni and Zn exhibited a good compromise of variability (CV > 15 %) and heritability (h 2 > 0.5). Similar heritabilities were reported by Herzig et al. [43] based on hyperspectral imaging and modeling of nutrient concentrations in barley grains. The authors reported highest heritabilities for Ca and N (h 2 >0.7) and medium heritabilities between 0.4 and 0.6 for 11 additional nutrients.
Stage and kinetic traits were then correlated to plant development traits and grain yield components (Fig. 3 and Table S9). Leaf concentration at maturity (D.c) of Cu, Fe, Mn, Mo and Ni showed negative correlation (− 0.56 ≤ r ≤ − 0.44) to the duration of plant developmental phases (SHO, HEA, MAT), indicating that a slow growth may lead to depletion of these elements in the leaf (Table S9). D.c values of N, P, K, Ca, Mg, S, B and Zn were slightly negatively correlated (r ≤ − 20) to grain yield (YLD). Considering that grain nutrient concentration tends to negatively correlate with YLD, the latter observation suggests the possibility that a high concentration in leaf of these nutrients at maturity could lead to an increased concentration in the grain. This pattern is more evident for N, as the correlation between its concentration at stages D and YLD was the lowest among all nutrients (r = − 0.39). Regarding the correlations between nutrients, it is worth mentioning the highly positive correlation among Cu, Fe, Mn, Mo and Ni concentration at maturity, with r values ranging from 0.90 to 0.94. In Fig. 3, the correlation among kinetic traits is separately presented for N, P, and K, as these are the most important nutrients for breeders. Differently to all other nutrients, all leaf traits of N, P and K, which include stages B and C, are positively correlated at a moderate level (r ≈ 0.4− 0.6) with SHO, HEA, MAT. This pattern suggests that a prolonged plant development is required for the accumulation of N, P and K in leaf from heading to grain filling. The C.c and kinetic traits of Cu follow a similar trend, as they are moderately correlated to MAT (0.32 ≤ r ≤ 0.37) and clearly correlated to the C.c and kinetic traits of P (0.56 ≤ r ≤ -0.44). In addition, for N the D.c values show a clear positive correlation to the other stage and kinetic traits (at r ≈ 0.5), while for P and K, D.c is poorly correlated to the other leaf traits. This indicates that N leaf concentration tends to follow consistent trends along the whole plant development. i.e. a high value at stage B tends to correspond to a high value at stage D. For P and K, instead, trends are mainly consistent between heading and grain filling phases, but not towards maturity .

Prediction of leaf nutrient concentration
The predictions of leaf nutrient concentration were based on a oneyear model, i.e. predictions related to single field plots were carried out by using exclusively the wet-lab data of the same year. The predictive model was first evaluated for the predictions carried out on the known dataset, i.e. for the field plots, which were analyzed by wet-lab methods. For this purpose, we employed two commonly-used parameters, namely the coefficient of determination (R 2 ) of the linear relationship between wet-lab and predicted data, and the mean relative error (MRE) ( Table 3). In many cases, R 2 and MRE were in agreement in measuring the model performance, with low MRE values corresponding to high R 2 values. In general, predictions of macro nutrient concentrations were better than for micro nutrients. Most macro nutrients R 2 values were higher than 0.70, while most of MRE values were lower than 0.25, indicating a high percentage (>70 %) of explained variance. In most cases micro nutrients, instead, exhibited intermediate values for R 2 and MRE, indicating a lower prediction power if nutrient concentrations are markedly reduced. Also Pandey et al. reported on predicting micro and macro nutrients in maize and soybean leaves based on hyperspectral imaging and PLS [62]. The authors found prediction accuracies comparable to our study with high R 2 values >0.80 for macro nutrients N, P, K and S and lower R 2 values <0.73 for micro nutrients B, Fe, Mn, Na and Zn.
However, in several cases R 2 and MRE appeared to be discordant, as for instance predictions for C (very low R 2 and MRE values), and for B and Fe (high R 2 and MRE values). For N, R 2 was very high (0.91), based on 2016 data, whereas the value was very low (0.33) for 2017. When the R 2 for N was estimated based on the combined data from both years ('2016 + 2017' column in Table 3) the value was high (0.90). All values of MRE for N, instead, were low, thus indicating high prediction quality, particularly in 2017 data.
The contradictions between R 2 and MRE values prompted us to critically analyze the reliability of these parameters as a measure of model performance. As a working hypothesis, we supposed that a source of error could derive by the nutrient remobilization in plants from stages A through D, as it may cause high differences of data range among nutrients. For this reason, we developed a parameter estimating the range of nutrient remobilization across stages for each nutrient (here denoted as 'remobilization index' [RI]; explained in the Material & Method section, paragraph 2.5) and tested its correlation with R 2 and MRE (Fig.  S2). In addition, a further parameter for the evaluation of model performance was calculated, namely the residual prediction deviation (RPD) [63]. R 2 was strongly correlated to RI (r = 0.82, Fig. S2) and only moderately to RPD (r = 0.59), while the correlation to MRE was weak (r Fig. 3. Correlation plots based on BLUEs between the stages and kinetic traits of nitrogen (N), phosphorus (P) and potassium (K) and plant development and yield components traits. See Table 1 for abbreviations and measuring units.
= 0.48). R 2 and RPD were strongly correlated to each other, MRE instead was neither correlated to R 2 nor to RPD. The explanation for this pattern has to be found in the formulas underlying R 2 , MRE and RPD. Both R 2 and RPD are highly affected by the extend of data range. R 2 depends on the difference between observed values and their mean, which in turn increases with nutrient remobilization. Likewise, RPD depends on the standard deviation of observed values (SD), whose value is also strongly affected by remobilization. In contrast, MRE is based on the relative difference between every observed value and the corresponding prediction, and therefore it is independent of data range, and hence remobilization. The weak positive correlation between MRE and RI can be explained by the fact that a high remobilization implies in many cases low nutrient concentration values at stage D, to which would probably correspond an increase of technical error of biochemical and/or spectral measurements. We conclude that, for the particular case of leaf nutrient remobilization study, MRE, instead of R 2 and RPD, is a suitable parameter for evaluating model performance.
By using MRE based on both experimental years as the main parameter to judge model performance, we could recognize three clusters of nutrient predictions: good prediction performance, at MRE below 0.20 for N, C, S, Cu; acceptable prediction performance (0.20 < MRE ≤ 0.32) for P, K, Ca, Mg, Mn, Mo, Ni and Zn; low prediction performance, at MRE greater than 0.60, for B, Fe and Na. The apparent contradiction between non-optimal MRE values and high heritabilities (compare Tables 2 and 3) may be explained by the fact that the BLUEs estimation probably leads to a compensation of prediction error.

Genome-wide association study (GWAS) of leaf nutrient concentration
A GWAS was carried out on the BLUEs of leaf nutrient stages and kinetic traits of population HEB-25 and its domesticated parent Barke. The highest levels of prediction ability of the GWAS model were reached for N, P, K and Ca (from 0.20 to 0.27), while among traits the highest value was for BC.a (0.26) (Table S10). Table S11 shows the distribution of the statistically significant marker-trait associations (MTA) along different levels of detection rate (DR). A total of 23,087 MTAs was detected for all nutrients along the six investigated traits. We applied a threshold of DR ≥ 50 % for selecting the most robust MTAs, thus resulting into 278 highly significant markers (Table S12). The subset of robust MTAs was quite evenly distributed across traits and across nutrients (Table S13). It is worth to notice the low number of significant markers (9) for C, which is most likely due to the very low variance of this element (Table 2). On the contrary, P and B are the nutrients with the highest number of MTAs (30 and 31, respectively).
The markers included in the 278 robust MTAs were grouped to 44 QTLs (Table S14): 18 QTLs are associated to a single nutrient, while the remaining ones are associated to more than one nutrient, in a range from two up to 12. Within every QTL, the marker associated to the highest DR score (DR max) was defined as the peak marker. The ten QTLs showing the highest DR max values were then selected as genomic hotspots (Table 4). Among the hotspots, QTL-4H-1 represents the case with the highest potential for both agricultural applications and biological insight. It is located at the telomeric region of chromosome 4H, ranging from 3.77 to 6.24 Mbp. This QTL exhibited the highest possible value of DR max, and was associated with the highest number of effected nutrients (12) among all QTLs, i.e. to all nutrients except C, S and Na. Moreover, QTL-4H-1 included 44 out of the overall 278 significant MTAs.
In particular, the QTL-4H-1 peak marker (JHI_Hv50k_2016_228144) included 25 MTAs. This observation prompted us to a detailed analysis of the peak marker effect, i.e. the mean relative performance of wild barley allele compared to the domesticated (Barke) allele (Fig. 4A). The peak marker exerts positive effects on N, P, K and Cu concentration in leaves, while negative effects on Ca and B concentration. At this point, it is worth noticing that Ca and B are the only nutrients showing a gradual increase of concentration in leaf from stages A to C (Fig. 2), thus indicating a peculiar remobilization physiology for these two elements.
In relation to these results, it is useful carrying out a QTL metaanalysis by relating the relative performance of QTLs to all the markers located within the QTL interval, whose effects on different phenotypic traits were investigated in previous GWAS studies on grain nutrient concentration in population HEB-25 (Table S15). QTL-4H-1 exerted positive effects on grain protein concentration (i.e. N bound in proteins, free amino acids and other sources) and P, K, Ca, Cu, Fe, Mo and Zn concentration in grains [43]. Therefore, the QTL-4H-1 effects on N, P, K and Cu concentration are positive in both leaf and grain, while for Ca the trends are opposite (negative effect in leaf, positive in grain). Such a pattern supports the idea that a high nutrient concentration in leaf during the overall plant development is required for increasing nutrient concentration in the grain, at least for N, P, K and Cu. This conclusion is in agreement to a previous study on wheat, reporting a positive correlation between N concentration in flag leaves and in mature grains [64]. In the latter study, leaf samples were collected 10 days after anthesis, which corresponds approximately to our stage C. In our dataset, N C.c was positively correlated to all other stage and kinetic traits, showing the highest value (r = 0.85) in relation to BCD.a (Fig. 3), which is proportional to leaf N concentration from heading until maturity stage, thus including a large part of plant development. Consistently, the significant cases of relative performance of the QTL-4H-1 peak marker include the BCD.a trait for all nutrients, and all other traits always follow the same trend of BCD.a (Fig. 4A). In case of N, the relative performance at stage D was remarkably higher compared to the other N traits. Interestingly, the variability of N concentration at stage D was 5-fold higher compared to all other stage and kinetic traits ( Table 2). This indicates that the D.c trait could have high potential for Table 3 Coefficient of determination (R 2 ) and mean relative error (MRE) of the predictions for leaf nutrient concentration based on PLS modelling or MLP modelling, the latter indicated by *. '2016' and '2017' indicate the two experimental years. '2016 + 2017' refers to R 2 and MRE values calculated on combined data from both experimental years.
fine-tuning grain nutrient concentration. At this point, it is useful to compare the nutrient effects between the QTL-4H-1 and the NAM-1 (NAC-type transcription factor) gene: both wild barley alleles induce an increase of grain protein concentration; however, while the wild barley allele at NAM-1 induces early senescence, the wild allele at QTL-4H-1 induces late senescence (i.e. 1-2 days more until maturity) [36]. Therefore, QTL-4H-1 may represent an additional tool for balancing nutrient concentration and senescence. An interesting observation was reported by Waters et al. [65]. The authors found that a major effect of the NAM-B1 gene in wheat was the increased remobilization of nutrients from vegetative tissue to grains. They showed that Fe and Zn remobilization from wheat leaves was blocked in RNAi knockdown lines of the NAM-B1 gene.
The relative performance of the QTL-4H-1 wild allele peak marker was then decomposed into the contributions of single HEB-25 families. Fig. 4B shows the distribution of the relative performance among families on the trait exhibiting the highest absolute peak marker effect for every nutrient. Family 17 induced the highest positive performance (27.9 %) on N concentration at stage D (N D.c) (Fig. 4B). Interestingly, N D.c showed the highest variability (CV = 15.9 %) among N traits and a good heritability value (0.59) ( Table 2), thus suggesting potential for breeding application.

Conclusion
In a very large field experiment consisting of 5,748 field plots, we could successfully predict the change of leaf concentrations of 15 micro and macro nutrients over four developmental stages. For this, we used appr. 15 % wet lab data for model training, field based hyperspectral imaging data and machine learning algorithms to model leaf nutrient concentrations. Heritabilities of the studied nutrients included middle and high values. The use of derived traits (kinetic traits) allowed a further increasing of heritabilities for all investigated nutrients (except Fe and Ni).
Wet-lab and predicted values of leaf nutrient concentrations were consistent in delineating patterns of changes along plant development. A prolonged plant growth until grain filling stage positively correlated with the accumulation of N, P, K and Cu in leaf, which in turn may lead to enhanced concentration of protein, P, K and Cu in grains. We identified a QTL (QTL-4H-1), which exerted a positive effect on the leaf concentration of these nutrients. This QTL was previously reported to have a positive effect also on protein, P, K and Cu concentration in the grain. Moreover, QTL-4H-1 was associated to a slight delay of senescence, thus possibly representing an additional tool together with the NAM-1 gene for fine-tuning grain nutrient concentration and plant development.
Our results sustain the reliability of high-throughput prediction of leaf nutrient kinetics based on hyperspectral field imaging and subsequent application of machine learning modelling. We demonstrated its potential for GWAS studies, aiming at understanding the genetic basis of nutrient remobilization under varying environmental and agronomical conditions. This may provide breeders a new tool for nutrient assessment in large-scale field experiments to ultimately select genes and genotypes better prepared for biofortification to increase nutrient concentration in harvested plant organs.

Author contributions
KP, AM, US and HPM planned and designed the research, MG, MS, Table 4 List of ten hotspot QTLs exhibiting the highest value of detection rate along phenotypic traits (DR max) and the nutrients associated to each QTL. ‚From (bp)' and ‚Until (bp)' indicate the start and the end of the QTL interval based on barley reference sequence Refseq 1.0, given in Bayer et al. [67].  HCK, AB, SW, AG, YATM and AMJ performed the experiments and analyzed the data, MG and KP wrote the manuscript, All co-authors read and revised the manuscript.

Data availability
All data of this study are available within the paper and the supplementary data published online.

Declaration of Competing Interest
The authors declare that they have no competing interests.