Phenotypic data
The distributions of BLUEs for each of the 14 end-use quality traits are illustrated in Fig. 1. There were continuous phenotypic distributions for all end-use quality traits. For grain characteristics, BLUEs ranged from -0.8 to 54.1 for SKCS hardness, 2.1 to 3.2 mm for SKCS size, 28.5 to 48.3 mg for SKCS weight, 55.6 to 65.4 kg/hL for test weight, and 8.3 to 15.7% for grain protein. For milling traits, BLUEs ranged from 36.5 to 54.9% for break flour yield, 57.1 to 74.3% for flour yield, 0.20 to 0.51% for flour ash, and 67.0 to 98.6 for milling score. For flour parameters, BLUEs ranged from 6.4 to 12.4% for flour protein, 3.4 to 18.2 g/mL for SDS sedimentation, 44.5 to 73.1% for water SRC, and 11.6 to 26.2 mL/g for FSV. For cookie diameter, BLUEs ranged from 7.8 to 9.7 cm. For all of these traits, differences in phenotypes would be considered to be industrially significant, with many values below minimum targets [25]. Broad sense heritability estimates and correlations between traits were previously described by Aoun et al. [27]. Moderate to high broad sense heritability (H2= 0.46-0.70) was observed for all traits except for grain and flour protein content (H2= 0.18 to 0.19) [27].
Population structure and linkage disequilibrium
Of the 40,518 SNPs, there were 14,102 (34.8%) SNPs on the A genome, 16,626 (41.0%) SNPs on the B genome, 8,656 (21.4%) SNPs on the D genome, and 1,134 (2.8%) SNPs on unaligned (UN) chromosome(s). PCA based on the first two PCs showed minimal clustering in wheat genotypes, which was expected since the plant materials in this study were from the same wheat breeding program (Fig. 2). The first 10 PCs accounted cumulatively for 26.3 % of the variation. The first four PCs explained 5.3%, 4.0%, 3.3% and 3.0% of variation, respectively. The genome‑wise LD dropped to an r2 threshold of 0.1 within 6.5 Mb on average (Supplementary Fig. S1). LD decayed to 0.1 at ~ 2.5–5.0 Mb for chromosomes on the A genome, to 5.0–12.0 Mb for chromosomes on the B genome, and to 5.0–10 Mb for chromosomes on the D genome (Supplementary Fig. S2).
Genome-wide association mapping
GWAS model selection
The best models within each method were selected based on examination of Q-Q plots. For MLM, we selected MTAs from the K (Kinship) model. Using FarmCPU, K + 2PCs (Kinship and Q based on the first two PCs) was selected to model SKCS hardness, SKCS weight, grain protein content, flour yield, flour ash, milling score, SDS sedimentation, water SRC, and cookie diameter, whereas K + 3PCs (Kinship and Q based on the first three PCs) was selected to model the remaining traits, SKCS size, test weight, break flour yield, flour protein content, and FSV. For BLINK, we selected K + 4PCs (Kinship and Q based on the first four PCs) for all traits.
In contrast to the Q-Q plots generated from FarmCPU models, the Q-Q plots from MLM and BLINK did not show a sharp deviation of the observed P-value distribution from the expected P-value distribution (Supplemental Fig. S3, S4, S5). These results suggest that FarmCPU provided a better control of false negatives and false positives compared to MLM and BLINK. Thus, only association mapping results from FarmCPU will be discussed in this study (Tables 1-4, Supplemental Fig. S6). MTAs generated from MLM and BLINK are provided in Supplementary Table S3, S4.
Marker-trait associations
Based on the LD between markers, each MTA identified from the FarmCPU models represent a distinct locus or QTL. Considering all traits together, a total of 178 significant MTAs were identified across all wheat chromosomes (Tables 1-4). Sixty-two MTAs were detected on the A genome, 77 MTAs on the B genome, 34 MTAs on the D genome, and five MTAs on unaligned (UN) chromosomes. Chromosome 1B and 7A carried the highest number of MTAs (n=16), whereas chromosome 4D had only a single MTA. The favorable alleles and their corresponding frequencies are described in Tables 1-4. There were 12 large-effect markers associated with 11 traits (1-2 markers per trait) (Table 5). For SKCS size, FSV, and cookie diameter, all significant markers had small effects.
For grain characteristics, five MTAs with large effects were detected on chromosome 1B, 2B, 4B, 5A, and 6B (Table 5). Markers S5A_480515221 and S6B_705613777 were associated with SKCS hardness and impacted the hardness index by 6.7 and 7.9 units on average, respectively. Marker S2B_533178165 was associated with SKCS weight and influenced the phenotype by 3.55 mg. For test weight, S4B_413497949 had the largest effect and resulted in 1.2 Kg/hL difference in the phenotype on average, whereas for grain protein content, S1B_46883868 had the largest effect with 0.85% increase/decrease in the phenotype. S1B_46883868 was also associated with flour protein content and affected the trait value by 0.63% on average.
For milling traits, five markers had large effects and were detected on chromosomes 1B, 1D, 5A, and 6B (Table 5). Marker S1B_653681752 was associated with break flour yield and flour yield and influenced trait values by 2.9% and 2.7% on average, respectively. An additional large effect marker was associated with flour yield: S6B_19335996 affected the trait value by 2.5%. Marker S4A_120144412 was associated with flour ash and influenced the phenotype by 0.03% on average. For milling score, S1D_14707739 and S5A_20640566 affected the phenotype by 3.8 and 2.4 units, respectively.
For flour parameters, there were four markers with large effects located on chromosomes 1B, 1D, and 4A (Table 5). In addition to break flour yield, S1B_653681752 also influenced water SRC by 4.9% on average. For SDS sedimentation, two large effect markers were identified including S1D_121990680 and S1D_411063068, which affected the trait values by 1.3 and 1.4 g/mL, respectively. Except for S5A_480515221, S4A_120144412, and S1D_121990680, which were associated with SKCS hardness, flour ash, and SDS sedimentation, the favorable alleles for the remaining nine large effect markers were present in high frequencies (86-93%) in this soft white wheat germplasm.
Loci associated with at least two end-use quality traits
Among the 178 MTAs identified in this study, there were 12 significant SNPs that were found to be associated with more than a single end-use quality trait (Table 6). In addition to the two large‑effect MTAs (S1B_46883868 and S1B_653681752) discussed above, there were ten small‑effect markers that were associated with at least two end-use quality traits. For each of the 12 markers that was associated with more than a single trait, there was desirable linkage between the favorable alleles. This suggests that these markers are having desirable pleiotropic effects and could be useful to simultaneously breed for more than a single end-use quality trait. Based on pairwise LD estimates between physically close markers, there were additionally five loci on chromosomes 1A, 1B, 6B, 7A, and 7B that were associated with more than a single end-use quality trait (Table 6). For each of these loci, LD between significant markers that were associated with different traits was higher than the r2 threshold of 0.1. For three of these loci, S1A_586706397/S1A_587581129, S6B_27918221/ S6B_29771821 and S7A_730416426/S7A_731026067, there was desirable linkage between marker alleles in each locus, whereas for the other two loci S1B_561712520/S1B_569507932 and S7B_624947199/S7B_636744313, there was unfavorable linkage. Therefore, for the latter two loci, selecting for one trait could negatively affect the other trait.
Co-localized MTAs with previously identified end-use quality genes/QTL
A total of 35 annotated genes were located close to the physical positions of the 12 large effect markers. The putative functions of these genes are described in Supplementary Table S5. In addition, we found 13 GWAS MTAs available at the IWGSC sequence repository that were within the genomic regions of the 12 large effect markers identified in our study. These GWAS MTAs from previous studies were associated with thousand kernel weight, test weight, grain fill duration, grain protein content, SDS sedimentation, and grain minerals (Cu and Zn) (Supplementary Table S5). Furthermore, comparative mapping (based on physical positions of molecular markers) between all the 178 identified MTAs from this study and end-use quality QTL/genes from previous genetic studies [2, 4, 6-17, 19-22] showed that 40 MTAs were positioned within genomic regions of previously discovered end-use quality genes/QTL (Supplementary Table S6).
Of the 16 identified loci for SKCS hardness in this study, 10 loci were found within genomic regions of previously reported grain hardness QTL (Supplementary Table S6). For instance, SKCS hardness associated markers in this study, S1A_583894689, S3A_690786387, S3B_718850970, S5A_480515221, S6B_130163859, S6B_705613777, and S7A_12011069 were located close to the positions of previously reported grain hardness associated markers/QTL, QGh.caas-1A (~575 Mb, [9]), wPt-4725 (709 Mb, [12]), QKh.WJ-3B.3 (~695 Mb, [8]), S5A_463766631 (464 Mb, [22]), IWB11485 (121 Mb, [2]), S6B_703822990 (704 Mb, [22]), wPt-0744 (0.2 Mb, [12]), respectively. Similarly, S1B_635869100 was positioned 8-13 Mb from Qgh-1B [13], QKh.WY-1B.1b [8], QGh.cass-1B [7], QNhi.hwwgr-1BL [10], and QKH.ksw-1B [21]. On chromosome 5B, S5B_549556474 identified in this study was found close to Qshi.hwwgr-5BL (566-571 Mb, [10]) and a QTL flanked by the SSR marker wmc289 (556 Mb, [11]). In addition, a MTA on chromosome 7B, S7B_643501407, was located at ~8 Mb from QKh.WJ-7B.1B [8] and QGh.caas-7B.1b [7]. Our comparative mapping showed the importance of these 10 previously characterized loci in controlling kernel hardness and suggests that the remaining six loci could be novel.
Three loci associated with SKCS size were located close to previously identified kernel size QTL (Supplementary Table S6). For instance, S2B_154846350, associated with SKCS size in this study, was located at 4-5 Mb from IWB30179 [2] and QKd.cob-1A (~ 134 Mb, [14]) that were associated with SKCS size and kernel diameter, respectively. Similarly, S2D_563799166 and S5A_578074731 were close to previously identified kernel diameter QTL, QKd.hwwgr-2DL (~ 552 Mb, [10]) and QKD.ksu-5A1 (568 Mb, [21]), respectively. For SKCS weight, S2D_563799166 identified in this study was positioned at ~12 Mb from QSkw.hwwgr-2DL (552 Mb, [10]) (Supplementary Table S6). Two of our identified MTAs for test weight were found near previously identified test weight associated markers/QTL. This includes S2A_762292662 located close to IWB35564 (760 Mb, [2]) and S7B_40194878 located at 7-8 Mb from QTW.ksu‑7B [21] and IWB54370 [2] (Supplementary Table S6).
Four of the associated loci with grain protein content in this study were previously reported (Supplementary Table S6). This includes S1B_633974958, which was positioned close to QGpc.caas-1B.1 (628-638 Mb) and QWgc.caas-1B (~628-634 Mb), which were associated with grain protein content and wet gluten content, respectively [7]. Similarly, S4B_63121316 and S5D_543253602 were found close to associated markers with grain protein content gwm368 (60 Mb, [11]) and wPt-9788/wPt-0400 (560 Mb, [12]), respectively. On chromosome 7A, the grain protein content associated marker, S7A_731026067 was 18 Mb from Qgpc.7A.1 (wmc525, [19]), which was associated with both protein content and dry gluten content.
For break flour yield, the associated locus tagged by marker S3B_630394456 was at the physical position of IWA6254 (630 Mb) also associated with break flour yield [17]. For flour yield, five of the identified MTAs were mapped close to flour yield associated loci in previous genetic studies (Supplementary Table S6). For instance, S1A_9128313 and S1A_15686346 were in close proximity to QFY.ksu-1A (7 Mb, [21]). S1A_9128313 and S1A_15686346 were also within the genomic regions of the genes TraesCS1A01G010900 (5 Mb) and TraesCS1A01G039600 (16 Mb), respectively. TraesCS1A01G010900 (5 Mb) and TraesCS1A01G039600 were annotated as low molecular weight glutenin subunit and high molecular weight glutenin subunit, respectively. Another MTA on chromosome 1B associated with flour yield, S1B_653681752, was identified close to QFY.ksu-1B (649 Mb) [21]. Similarly, S5A_382294123 and S6B_19335996 were found in proximity to IWB76667 (384 Mb, [2]) and IWA7725 (27 Mb, [17]), respectively. Two flour ash associated loci in this study, S5A_3649534 and S5B_68052478, were positioned close to Qgac.cob-5A (gwm443, 11 Mb) and Qgac.cob-5B.1 (gwm540, 67 Mb), respectively, which were previously identified by El-Feki et al. [14] (Supplementary Table S6).
For flour protein, S2D_28763299 was located 15-17 Mb from QGpc.caas-2D [7], which was associated with grain protein content and QWgc.WY-2D.5 [8], which was associated with wet gluten content. Similarly, S7A_730416426 associated with flour protein content in this study was ~17 Mb away from qGPC.7A.1 [19], which was associated with grain protein content and dry gluten content (Supplementary Table S6).
Two SDS sedimentation associated markers in this study were previously identified (Supplementary Table S6). These loci include S1B_561712520 located close to IWB14950 (558 Mb, [2]) and the gene Glu-B1 (556 Mb) and S5B_539075483, which mapped close to QSsd.caas‑5B (527 Mb, [9]). Among water SRC associated markers, S1B_547973154, S1B_653681752, S2B_66559534, and S6B_29771821 were positioned near Glu-B1 (556 Mb), IWB27057 (652 Mb, [2]), IWA820 (44 Mb, [17]), and IWA7725 (27 Mb, [17]), respectively (Supplementary Table S6).