A simple, cost-effective high-throughput image analysis pipeline improves genomic prediction accuracy for days to maturity in wheat

High-throughput phenotyping and genomic selection accelerate genetic gain in breeding programs by advances in phenotyping and genotyping methods. This study developed a simple, cost-effective high-throughput image analysis pipeline to quantify digital images taken in a panel of 286 Iran bread wheat accessions under terminal drought stress and well-watered conditions. The color proportion of green to yellow (tolerance ratio) and the color proportion of yellow to green (stress ratio) was assessed for each canopy using the pipeline. The estimated tolerance and stress ratios were used as covariates in the genomic prediction models to evaluate the effect of change in canopy color on the improvement of the genomic prediction accuracy of different agronomic traits in wheat. The reliability of the high-throughput image analysis pipeline was proved by three to four times of improvement in the accuracy of genomic predictions for days to maturity with the use of tolerance and stress ratios as covariates in the univariate genomic selection models. The higher prediction accuracies were attained for days to maturity when both tolerance and stress ratios were used as fixed effects in the univariate models. The results of this study indicated that the Bayesian ridge regression and ridge regression-best linear unbiased prediction methods were superior to other genomic prediction methods which were used in this study under terminal drought stress and well-watered conditions, respectively. This study provided a robust, quick, and cost-effective machine learning-enabled image-phenotyping pipeline to improve the genomic prediction accuracy for days to maturity in wheat. The results encouraged the integration of phenomics and genomics in breeding programs.


Background
The efficient and precise phenotyping of a large population is one of the main tasks in breeding programs [1]. For example, the recording process of grain yield is currently difficult, time-consuming, and costly. The visual assessments are normally incapable of attaining small but important phenotypic variations [2]. Even with good scoring, only small fractions of phenotypes Open Access like canopy color can be recorded with the use of visual assessments. The scoring methods cannot statistically indicate the effect of stress on diverse germplasms [1,2].
Such barriers in phenotyping have motivated plant breeders to collaborate with engineers and invent modern technologies for high-throughput phenotyping (HTP) in greenhouses and fields [1]. The HTP will become more advantageous when it is a non-invasive and non-destructive method like proximal, remote sensing, and digital imaging [3]. The advances in data analysis have enabled machine learning (ML) to provide an accurate value of stress-related phenotypes [1]. A pipeline with a complete framework for fast feature extraction from high-throughput imaging can be used as a platform for real-time phenotyping [4][5][6][7][8][9][10].
The HTP platforms can include instruments such as RGB (read, green, blue), multispectral and hyperspectral cameras, spectrometer, normalized difference vegetation index (NDVI) sensors, and light detection and ranging (LiDAR) technology [1,3,11,12]. The RGB cameras are widely used in field phenotyping, especially for estimating canopy coverage [13][14][15][16]. In addition, the RGB imaging is used as an alternative to NDVI in some researches [14][15][16][17]. The assessment of senescence [18,19], crops nitrogen content [20], soil water evaporation [16], early vigor [21], and physiological yellowing [11] are conducted by digital RGB image analysis. Physiological yellowing which shows plant senescence and occurs naturally with time is used as an indicator of maturity or the impact of abiotic stress [11,22]. Moreover, some researches have provided successful protocols for designing, developing, and deploying high-efficiency image analysis pipelines to assess the quantity of plant response to biotic and abiotic stresses [2,23,24]. High-throughput image analysis by computer vision and ML for phenotyping iron deficiency chlorosis (IDC) in soybean [1], hyperspectral imaging for drought stress in cereals [25], and thermal imaging in spinach [26] are some of the recent successful reports.
Drought stress in the Middle East usually occurs at the end of the growing season when spike has already appeared and seed is at the development stage. In the Persian plateau, where most of the environments are arid or semi-arid, farmers are well-trained over the centuries to store rainwater throughout spring and irrigate farms with the stored water at the end of the growing season. The Persian farmers irrigate their farms two to four more times with the stored water after spike appearance to avoid yield loss due to late-season drought stress. This strategy leads to a significant increase in wheat grain yield [27]. The impact of drought stress and irrigation at the end of the growing season on different genotypes needs further investigations.
Genomic prediction (GP) [28] methods use all genomic information irrespective of their position, status [quantitative trait locus (QTL), causal mutation, linked marker, etc.], and the specific effect on the trait of interest. The GP model trained in the training set (TS) will be applied to the validation set (VS) to estimate the accuracy of predictions. HTP and genomic selection (GS) accelerate genetic gain in breeding programs [3]. The use of a major QTL as a fixed effect in a GP model increases the accuracy of GP [1]. In wheat, the selection is accelerated by adding traits like canopy temperature (CT) and NDVI as secondary traits or covariates in GP models [3,23,29,30].
Motivated by this, this study reported the impact of terminal drought stress (TDS) and well-watered (WW) conditions on days to maturity (DTM) in a highly diverse bread wheat germplasm through an ML-based imagephenotyping pipeline.

Plant materials and field trials
The association panel used in this study included 286 bread wheat accessions from Iran historical germplasm (199 landraces identified during 1931-1968 in the Persian plateau and 87 cultivars released during 1942-2014 in Iran). The plant materials were kindly provided by the University of Tehran (UT) and Seed and Plant Improvement Institute (SPII), Karaj, Iran. The detailed information about the association panel is provided in Additional file 1: Tables S1 and S2. The experiments were carried out at the Kheirabad Agricultural Research Station (36°31′51.7″N and 48°45′29.9″E) in Zanjan province during the 2017-2018 cropping season using two separate alpha lattice designs [31] with two replications for each. The plots were 1 m in length, 1 m in width, and 0.5 m apart. Drip irrigation method was used for watering with the use of two tapes for each plot. Irrigation was conducted every ten days until the spike appearance. Then, TDS was inducted by terminating irrigation for one design whereas another design WW for three more times.

Genotyping and quality control
We used the genotyping-by-sequencing (GBS) [32] method for genetic fingerprinting and Poland et al. [33] method for library construction. The genotyping method has been described for the association panel, previously [34,35]. Briefly, DNA was extracted by a modified cetyltrimethylammonium bromide (CTAB) method [36] and double-digested with PstI and MspI restriction enzymes, barcoded adapters were ligated to each DNA sample using T4 ligase, polymerase chain reactions (PCRs) were done using primers complementary to both adaptors, size-selection for 250-300 bp fragments was conducted using an E-gel system (Life Technologies, Inc.), and the size-selected library was sequenced on an Ion Proton sequencer (Life Technologies, Inc.). Sequence reads were trimmed to 64 bp sequences, and identical reads were grouped. Then, unique sequence tags were assigned to the sequence groups. The unique tags were aligned internally, whereas up to 3 bp interval alignment mismatch was allowed. The Trait Analysis by aSSociation Evolution and Linkage (TASSEL) software [37] was used to utilize the Universal Network Enabled Analysis Kit (UNEAK) pipeline [38] for SNP calling. SNPs with a missing rate of more than 20% and SNPs with a minor allele frequency (MAF) of less than 5% were removed. Unanchored SNPs were excluded too. The remaining missing data of the whole SNP data set were imputed in one step using the LD KNNi method [39] in the TASSEL software [37], whereas K = 10 was used in LD KNNi. Finally, 9047 SNPs were used for further analysis.

Population structure and molecular markers estimates
The population structure was evaluated by the Bayesian clustering approach with the use of an admixture model in the STRU CTU RE software [40]. The number of subpopulations (K) was assessed with the use of 10,000 burnin and 10,000 Markov Chain Monte Carlo (MCMC) for K = 1-10 in 10 independent runs. The best K value was estimated by ΔK statistic [41] in the structure harvester website (https ://taylo r0.biolo gy.ucla.edu/struc tureH arves ter). Two subpopulations (SBP-I and -II) were identified within the association panel. The SNP calling was performed for each subpopulation and 7714 SNPs for SBP-I and 5873 SNPs for SBP-II were identified. A number of 4785 markers were common between SBP-I and SBP-II, which were systematically separated and named as common markers marker set (CMMS). The molecular markers estimates were assessed for each chromosome using the full matrix option in TASSEL software [37].

Phenotypes
Phenotypic measurements included days to heading (DTH), days to maturity (DTM), duration of headingto-maturity (DHTM), plant height (PH), and grain yield/ m 2 (GY). For details on measurements of DTH, DTM, DHTM, PH, and GY and time of assessments, please refer to the manual "Physiological breeding II: a field guide to wheat phenotyping" [42].

Image acquisition
A Canon PowerShot SX30 IS camera was installed on a simple handheld phenocart. The phenocart height was 1.7 m. A flat L-shaped metal bar with 0.5 m long was installed on the phenocart. The camera was mounted on the L-shaped metal bar upside down, whereas the camera with the lens opened had 1.6 m distance from the ground. The phenocart was on the right side of the plots during the imagings (Fig. 1).
The images were captured two weeks after TDS induction from the plots. In addition, the images were taken with the Scene Intelligent Auto mode of the camera during two consecutive days from 10 a.m. to 2 p.m. when the weather was completely sunny. Therefore, no color correction was applied to the captured images. The flash function was kept off to have stable light too. All of the images were taken as RGB and stored in JPEG format with a resolution of 4320 × 3240 pixels (Additional file 2: Figure S1). In total, 1144 images were taken two weeks after TDS induction and used in the ML model.

Image processing
In order to avoid shade, shoe, empty space, margin, etc. all of the images were cropped to 500 × 500 pixels using Preview software, so that the cropped images could represent the color of the canopies more precisely (Fig. 2a,  d). A function was defined for the color threshold based on the CIELAB color space (L*a*b) [43] in MATLAB_ R2015b software. The cropped RGB images were converted to L*a*b color space. The first channel (L, from black (0) to white ( + 100)) was kept intact, the second channel (a, from green ( − 100) to red ( + 100)) was converted to half and defined from 0 to + 100, and the third channel (b, from blue ( − 100) to yellow ( + 100)) was also converted to half and defined from 0 to + 100 (Fig. 2b, e). The masked images were converted to binary format (Fig. 2c, f ). With the use of this strategy, the black pixels were an indicator of the range of cold colors (from the light illumination to the dark green and blue), and the white pixels were an indicator of the range of warm colors (from the light illumination to the dark red and yellow). Finally, the color proportion of the black to white pixels as a sign of the tolerance ratio (TOR) and the color proportion of the white to black pixels as a sign of the stress ratio (STR) were calculated for each plot and saved in a text file. The defined MATLAB function and the written code are provided in Additional file 3: Scripts S1 and S2.

Data analysis
Analysis of variance (ANOVA) was carried out for each phenotype under TDS and WW conditions separately using the proc mixed procedure in SAS software version 9.4 [44]. The data analysis model was as follow: Fig. 2 Image processing overview to assess tolerance and stress ratios under terminal drought stress (TDS) and well-watered (WW) conditions in wheat. a and d are cropped RGB images taken two weeks after drought stress induction under TDS and WW conditions, respectively. b and e are masked images in the Lab color space using defined function under TDS and WW conditions, respectively. c and f are masked images converted to binary format under TDS and WW conditions, respectively. Using this strategy, the black pixels represent non-dry tissues and the white pixels indicate dried tissues. The tolerance ratios were estimated as Tolerance ratio (TOR) = (Black pixels)/(White pixels) and the stress ratios as Stress ratio (STR) = (White pixels)/(Black pixels) where y ijk represents the observed phenotype of the ith genotype at the jth replication of the kth block within the jth replication, µ represents the overall mean, g i indicates the genetic effect of the ith genotype, r j indicates the effect of the jth replication, b k(j) shows the kth block effect within the jth replication and ε ijk shows the residual effect following N (0, σ 2 ε ). All effects were considered as random. The estimation of variance components was performed by the proc varcomp procedure, whereas all effects were considered as random. Heritability ( H 2 ) estimates were calculated based on each accession mean with an assumption of independence of effects using the following equation: where σ 2 g , σ 2 ε , and r represent the genotypic variance, residual variance, and the number of replications, respectively [45]. Best linear unbiased predictions (BLUPs) of genetic effect for each genotype were estimated under TDS and WW conditions using the R package lme4 [46] in the same model as described for the phenotypic analyzes. Then, the BLUPs were used for GP assessments.

GP strategy
For five-fold cross-validation (CV), 20% of accessions were randomly assigned to a VS, whereas all of the remaining genotypes were used as a TS. The whole process was repeated 100 times for each GP (The Bayesian analyses were implemented along with 10,000 iterations and 1000 burn-ins). The CMMS was used as a marker set for assessing genomic estimated breeding values (GEBVs). The accuracy of the GP was estimated as Pearson's correlation coefficient among GEBVs and BLUPs over TS and VS. The average of accuracies was reported across folds and repeats [47]. The GPs were implemented with seven different methods including genomic best linear unbiased prediction (GBLUP), ridge regression-best linear unbiased prediction (RR-BLUP), Bayesian A (BA), Bayesian B (BB), Bayesian C π (BC π ), Bayesian LASSO (BL), and Bayesian ridge regression (BRR) in iPat software [48]. A brief review of the GP methods is provided by Juliana et al. [49].
Four univariate (UV) GP models were defined. Five phenotypes (DTH, DTM, DHTM, PH, and GY) were evaluated in each of the UV models under TDS and WW conditions, separately. The UV1 model did not contain any covariate. TOR as a covariate was included in the UV2 model. STR as a fixed effect was included in the UV3 model. Both TOR and STR as fixed effects were included in the UV4 model.
In total, 280 analyses were conducted including 4 UV models, 5 phenotypes, 2 irrigation conditions, and 7 GP methods.

Field conditions
Plantings were conducted at the Kheirabad Agricultural Research Station in Zanjan province in the middle of October and weather conditions were recorded during the cropping season (Additional file 4: Figure S2). Zanjan province is located in a cold semi-arid climate zone.

Population structure and distribution of molecular markers
The existence of two main subpopulations was identified using the ΔK statistic (Additional file 5: Figure S3). The cluster membership coefficients (Q) indicated that the SBP-I contained 77 cultivars and 71 landraces, and the SBP-II included 128 landraces and ten cultivars (Additional file 6: Table S3). In the whole association panel, the highest number of markers was on chromosome 2B (419), while the lowest number of SNPs was on chromosome 4D (34) ( Table 1). The genetic map length was the longest for chromosome 3A (171.063 cM), while the shortest length was for chromosome 2D (85.027 cM) ( Table 1). The highest marker density was on chromosome 2B (3.76 Marker/cM), while the lowest marker density was on chromosome 4D (0.38 Marker/cM) ( Table 1). The B genome had the highest number of markers (2197), followed by the A (1794) genome and the D genome (794) ( Table 1).

Phenotypic data summary
The descriptive statistics, variance parameters ( σ 2 G and σ 2 E ), and heritability ( H 2 ) were estimated for all traits under TDS and WW conditions, separately ( Table 2). All traits had higher phenotypic values under the WW conditions (except STR) compared to the TDS conditions ( Table 2). In addition, the higher estimates of σ 2 G , σ 2 E , and H 2 were observed for all traits (except STR) under the WW conditions (Table 2). Pearson correlation coefficients were calculated for all traits under both TDS and WW conditions ( Table 3). The DTH and DHTM indicated the highest correlations under TDS and WW conditions (− 0.68 and − 0.73, respectively) ( Table 3). Furthermore, the DTH and PH were correlated under TDS and WW conditions (0.57 and 0.60, respectively) ( Table 3). The DTM and DHTM were positively (0.58 and 0.44, respectively) correlated under TDS and WW conditions (Table 3). However, the DHTM and PH were negatively (− 0.35 and − 0.40, respectively) correlated under TDS and WW conditions (  (Table 3).

GP
The prediction accuracies varied from − 0.06 to 0.45 ( Table 4). None of the traits indicated high prediction accuracy in the UV1 model, where no fixed effect was utilized in the GP models to estimate GEBVs (  Table 4). None of the highest of accuracies was identified using the BA and BB methods in the UV2, UV3, and UV4 models ( Table 4). The prediction accuracies of the DTM were increased three to four times using the UV2, UV3, and UV4 models under both TDS and WW conditions ( Table 4).

Discussion
Phenotypes with stable heritability are less sensitive to the GP method [50,51]. DTH, PH, and NDVI showed high heritability and were used as fixed effects in some studies [3,23,29,45]. Heritability and correlation among traits are important factors to attain higher prediction accuracy [3]. High broad-sense heritability (> 0.57) was observed for wheat vegetation indices with the use of unmanned aerial systems (UAS) [23]. In addition, the visual and digital assessments showed a 0.95 correlation for the physiological yellowing in wheat, whereas the digital assessments had 0.76 heritability [11]. In the present study, regardless of the correlation values, all agronomic traits had a positive correlation with TOR and a negative correlation with STR under both TDS and WW conditions, respectively. The heritability of TOR was 0.76 under TDS conditions, and the heritability of STR was 0.29 under WW conditions. As a conclusion, positive correlation and high heritability of TOR with DTM under TDS conditions, as well as negative correlation and low heritability of STR with DTM under WW conditions indicated the high adaptability of the association panel to drought stress.
In this study, the whole association panel was a mixed population (87 cultivars and 199 landraces). More accurate results were reported from mixed populations because more diversity in TS and more inbred genotype in VS would be available during the CVs [52][53][54][55]. In the breeding programs, a diverse or an inbred VS would be compared with a large and diverse TS containing high genetic diversity [53]. This approach will prevent the occurrence of a full relationship among genotypes in TS and VS, and consequently, more reliable results will be obtained [56,57]. Higher marker density will provide better prediction accuracy [58,59]. However, if MS covers the whole genome appropriately, the GP can predict all QTLs with stable linkage disequilibrium (LD) across subpopulations [28,60,61]. The present study used the markers which were common between subpopulations to obtain higher prediction accuracy [62]. RR-BLUP and GBLUP are mathematically equivalent [49]. RR-BLUP demonstrates more reliable results for QTLs with small effects [50]. If TS is closely related to the selected candidates, the GBLUP method will obtain a more nonadditive genetic variance [63]. The Bayesian methods can provide better results when the number of QTLs decreases and effect increases [51]. The genetic architecture of phenotypes would change GEBV [64,65]. In addition, adding secondary traits or covariates to the UV and multivariate (MV) GP models would increase prediction accuracy [3,29,45]. The results of the present study showed that all of the GP methods had the highest prediction accuracy for DTM (0.38-0.45) when both TOR and STR were used in the UV4 model under both TDS and WW conditions. The CT and NDVI as secondary traits in wheat improved the prediction accuracy of GY by 70% [29]. Furthermore, the manually taken images indicated 0.61-0.78 correlation with the visual scoring of the physiological yellowing in wheat [11]. The GP accuracy was about

Table 3 Pearson correlation coefficients for seven traits in an association panel including 286 Iran bread wheat accessions grown under terminal drought stress (TDS) and well-watered (WW) conditions in semi-arid environments, Iran
The numbers under the diagonal indicate correlation under TDS conditions.

Conclusions
The present study activated an ML-enabled image analysis pipeline to identify TOR and STR impact on the GP of the DTM under TDS and WW conditions. The results revealed the reliability of this pipeline for quantifying small phenotypic variations and integrating its advantages in genomic studies. The high prediction accuracy

Table 4 Genomic prediction (GP) accuracy for five agronomic traits in an association panel including 286 Iran bread wheat accessions grown under terminal drought stress (TDS) and well-watered (WW) conditions using high-throughput image analysis results as fixed effects in the univariate (UV) GP models
The average of accuracies was reported across folds and repeats.