Genome-wide association mapping of black point reaction in common wheat (Triticum aestivum L.)

Black point is a serious threat to wheat production and can be managed by host resistance. Marker-assisted selection (MAS) has the potential to accelerate genetic improvement of black point resistance in wheat breeding. We performed a genome-wide association study (GWAS) using the high-density wheat 90 K and 660 K single nucleotide polymorphism (SNP) assays to better understand the genetic basis of black point resistance and identify associated molecular markers. Black point reactions were evaluated in 166 elite wheat cultivars in five environments. Twenty-five unique loci were identified on chromosomes 2A, 2B, 3A, 3B (2), 3D, 4B (2), 5A (3), 5B (3), 6A, 6B, 6D, 7A (5), 7B and 7D (2), respectively, explaining phenotypic variation ranging from 7.9 to 18.0%. The highest number of loci was detected in the A genome (11), followed by the B (10) and D (4) genomes. Among these, 13 were identified in two or more environments. Seven loci coincided with known genes or quantitative trait locus (QTL), whereas the other 18 were potentially novel loci. Linear regression showed a clear dependence of black point scores on the number of favorable alleles, suggesting that QTL pyramiding will be an effective approach to increase resistance. In silico analysis of sequences of resistance-associated SNPs identified 6 genes possibly involved in oxidase, signal transduction and stress resistance as candidate genes involved in black point reaction. SNP markers significantly associated with black point resistance and accessions with a larger number of resistance alleles can be used to further enhance black point resistance in breeding. This study provides new insights into the genetic architecture of black point reaction.

association between the presence of fungi and black point development has been discounted by some workers [13][14][15], who pointed out that it may be caused by enzymatic browning following stress. Oxidases, such as peroxidases (POD) [15,16], polyphenol oxidase (PPO) [17,18] and lipoxygenase (LOX) [19], that catalyze oxidation of phenolic compounds to brown or black pigments (melanins and quinines) [18,20], may be triggered by high humidity during the later stages of grain filling. Susceptible varieties have higher POD [15,21] and phenylalanine ammonia-lyase (PAL) (an enzyme involved in phenolic acid biosynthesis) [21] activities.
Although several cultural, biological and chemical control strategies have been used to control black point, breeding resistant cultivars remains the most effective, economic and environmentally sustainable approach to control this disease [4,22,23]. Previous studies on the known genetic basis of black point resistance involved classical linkage-mapping methods using bi-parental populations [22][23][24], in which only two allelic effects can be evaluated for any single locus. Recent advances in genomics, particularly development of the wheat 90 K [25], 660 K (JZ Jia, pers. comm.) and 820 K SNP arrays [26] have made it feasible to genotype large germplasm collections with high-density SNP markers. As a result, the GWAS based on linkage disequilibrium (LD) has been widely adopted to investigate existing allelic diversity for important and complex agronomic traits. Compared with classical linkage-mapping, GWAS permits a more representative gene pool and a higher mapping resolution, because all historical meiotic events that have occurred in the ancestors of a diverse germplasm panel can be used [27]. Moreover, GWAS bypasses the expense and time of developing mapping populations, and enables the mapping of many traits in one set of genotypes, making the method more efficient and less expensive than linkage mapping [28]. Thus, GWAS has become a powerful alternative approach for linkage mapping [29]. GWAS has been applied to investigate a range of traits, including disease resistance [30,31], end-use quality [32], and yield components [33][34][35].
The Yellow and Huai River Valleys Facultative Wheat Region is one of the most important agricultural regions of wheat production in China with an area of 15.3 million hectares. Black point has become one of the important diseases in this region due to increased water management and fertilizer use. Breeding for black point resistance could be greatly improved by the identification and use of closely associated molecular markers. Although GWAS has become a powerful approach to dissect the genetic architecture for many traits, it has not been used to analyze traits related to black point. In the present study, we used a diverse panel of 166 elite wheat cultivars in GWAS to (1) dissect the genetic architecture of black point resistance, (2) identify SNPs significantly associated with black point resistance, and (3) search for candidate black point resistance genes for further study.

Marker coverage and genetic diversity
A total of 18,920 SNPs from the 90 K and 283,652 from the 660 K SNP array based on the consensus genetic maps and physical map (IWGSC, http://www.wheatgenome.org/) were chosen for GWAS of black point reaction in 166 wheat cultivars (Additional file 1: Table S1). After removing the SNPs with minor allele frequency (MAF) < 5% (28,935 SNPs) and missing data >20% (13,715 SNPs), 259,922 SNPs were employed for subsequent analysis (Additional file 2: Table S2). These markers spanned a physical distance of 14,063.9 Mb, with an average density of 0.054 Mb per marker. Total of 89,519 (34.4%), 146,270 (56.3%) and 24,133 (9.3%) markers were from the A, B and D genomes, respectively, with corresponding map lengths of 4934.5, 5179.0 and 3950.4 Mb. The marker density for the D genome (0.202 Mb per marker) was lower than that for the A (0.099 Mb per marker) and B (0.042 Mb per marker) genomes. The average genetic diversity and polymorphism information content (PIC) for the whole genome were 0.356 (0.009-0.500) and 0.285 (0.009-0.380), respectively. Both the genetic diversity and PIC of the A (0.365 and 0.291) and B (0.363 and 0.289) genomes were higher than the D (0.340 and 0.265) genome. The number of markers, map length, genetic diversity and PIC for each chromosome are shown in Additional file 2: Table S2.

Population structure and linkage disequilibrium
In the plot of K against ΔK, a break in the slope was observed at K = 3 followed by flattening of the curve, indicating that this panel consists of three subgroups, which was consistent with the results of principal components analysis (PCA) and neighbor-joining (NJ) tree analysis (Fig. 1). Subgroup I, the largest group with 62 accessions, was dominated by Shandong and foreign cultivars; Subgroup II consisted of 54 accessions, mainly comprising varieties from Henan, Anhui and Shaanxi provinces; Subgroup III had 50 accessions, most of which were from Henan province (Additional file 1: Table S1).
In total, 12,324 markers from the 90 K and 660 K SNP arrays were used to evaluate LD decay for the whole genome as well as the A, B and D genomes separately. Around 14.3% of all pairs of loci were in significant LD (P < 0.001) with average r 2 of 0.174 on a genome-wide level by the 90 K and 660 K SNP assays. The B genome contained the highest percentage of significant markers (44.2%), followed by the A (33.6%) and D (22.2%) genomes. The scatter plots of r 2 against physical distance (Mb) indicated a clear LD decay with increasing physical distance (Additional file 3: Figure S1). According to [28], the critical value for significance of r 2 was evaluated at 0.079, 0.083, 0.095 and 0.082 for the A, B, D and whole genomes, respectively. The point at which the LOESS curve intercepts the critical r 2 was determined as the average LD decay of the panel [28]. Based on this criterion, LD decay distance was about 8 Mb for the whole genome. The highest LD decay was observed in the D genome (11 Mb), followed by the A (6 Mb) and B (4 Mb) genomes (Additional file 3: Figure S1).

Phenotypic variations for black point reaction in the field
Continuous variation was observed across five environments (Additional file 4: Figure S2; Additional file 5: Table S3). The resulting best linear unbiased predictors (BLUPs) for black point scores across all environments ranged from 1.6 to 80.6% with an average of 23.3% (Additional file 4: Figure S2; Additional file 6: Figure S3), presenting a wide range of reactions for black point and indicating that this diversity panel was ideal for conducting GWAS. Analysis of variance (ANOVA) for black point scores revealed significant differences (P ≤ 0.001) among genotypes (G), environments, and genotype × environment (G × E) interactions ( Table 1). The broad sense heritability (h 2 ) estimate for black point scores across all five environments was 0.62, indicating that much of the phenotypic variation was derived from genetic factors and therefore suitable for further association mapping.

Relationship between black point reaction and the number of resistance alleles
To further understand the combined effects of alleles on reaction to black point, we examined the number of favorable alleles in each accession. The numbers of favorable alleles in single accessions ranged from 5 to 21, compared to 4 to 20 unfavorable alleles (Additional file 1: Table S1). The relationships between black point BLUP values and numbers of favorable and unfavorable alleles estimated by linear regression showed a dependence of black point BLUP values on the number of favorable alleles with r 2 = 0.85 (Fig. 4a), and number of unfavorable alleles with r 2 = 0.85 (Fig. 4b). Thus, accessions with more

Discussion
The diversity panel, including released cultivars, advanced lines and landraces from different ecological regions, thus had a high genetic diversity with a wide range of reaction to black point. Our data showed that 86.1% (143) of the 166 accessions were susceptible to black point (black point score > 10%), indicating that black point is a considerable threat to wheat production throughout the world. However, most of the previous studies for black point were mainly conducted on pathogen identification, biological characteristics, disease cycle and control [7,13,38]. Thus, it is necessary to select cultivars highly resistant to black point and to identify markers significantly associated with resistance to facilitate breeding for resistance by MAS.

Genetic diversity, population structure and linkage disequilibrium
The mean genetic diversity and PIC of 0.356 and 0.285, respectively, indicated higher polymorphism than in previous reports [39,40]. Our diversity panel thus has high genetic diversity and approximately reflected the genetic diversity in winter wheat from the Yellow and Huai River Valleys Facultative Wheat Region. More than 56% of SNPs had PIC of 0.20-0.40, which is deemed as a suitable range for GWAS [41]. Furthermore, the A and B genomes had higher genetic diversity and PIC than the D genome, consistent with previous reports [30,40] (Additional file 2: Table S2). All results indicated that our diversity panel has high genetic diversity and was suitable for GWAS.
The diversity panel could be divided into three subgroups (Fig. 1), and the characterization of the subgroups was largely consistent with geographic origins and pedigrees. For example, Zhongmai 871, Zhongmai 875 and Zhongmai 895, which were derived from Zhoumai 16, clustered with Zhoumai 16 in group 3 (Additional file 1: Table S1). Numerous studies have shown that the lack of appropriate correction for population structure can lead to spurious MTAs [42][43][44][45]. Consequently, to eliminate spurious MTAs resulting from population structure, subpopulation data (Q matrix) were considered as fixed-effect factors, whereas the kinship matrix was considered as a random-effect factor, and a MLM implemented in Tassel v5.0 and FarmCPU were adopted for association analysis in the current study [36].
The LD decay affects the precision of GWAS and this is influenced by many factors like population structure, allele frequency, recombination rate and selection [44,46,47]. Previous studies reported that LD decay in common wheat ranged between 1.5-15 cM using SSR [28,46,48], DArT [33] or SNP [30,47] markers. In this panel, the LD decay was about 8 Mb for the whole genome (Additional file 3: Figure S1), consistent with previous reports. The LD decay of the D genome (11 Mb) was higher than the A (6 Mb) and B (4 Mb) genomes (Additional file 3: Figure S1), also consistent with previous studies [47][48][49], suggesting that fewer markers are needed for GWAS in the D genome than the A and B genomes. The marker densities for the A, B and D genomes were 0.099, 0.042 and 0.202 Mb/marker, and thus highly reliable for detecting MTAs with respect to LD decay in the diversity panel according to Breseghello and Sorrells [28]. The reason for the high LD of the D genome is mainly due to limited infusion of Aegilops tauschii in the evolutionary history of common wheat [38,49]. The average r 2 (0.174) values observed between linked loci pairs were higher than in previous studies [46,50]. Reif et al. [51] reported that LD (r 2 ) is expected to be higher in released cultivars than landraces. Moreover, Würschum et al. [52] indicated that QTL with small effect can be detected at higher LD (r 2 ), whereas only QTL with large effects can be detected at lower LD (r 2 ). Our results thus suggested a high mapping resolution and strong QTL detection power for black point resistance.

Comparison of the 90 K and 660 K SNP arrays
One of the key factors for GWAS is high marker density in whole genomes because sparse coverage reduces the power of marker identification [53]. Although the 90 K SNP array has emerged as a promising choice for highdensity, low cost genotyping [34,54], the presence of large gaps, particularly low coverage for the D genome, reduces the power of marker identification and decreases the precision of QTL mapping. To resolve the problem, the GWAS for black point resistance was performed using 259,922 markers from the 90 K and 660 K SNP arrays, providing a greater coverage of the genome. Only 8 loci were identified by the SNPs from 90 K array, whereas 23 were detected by the 660 K SNP, indicating that the 660 K SNP array with its much higher marker density had a significant advantage in GWAS.
Oxidases, such as PPO [15] and POD [17], could have enhanced the development of black point. The PPO gene (Ppo-A1) mapped to the long arm of chromosome 2AL in the interval IWB59334-IWB5777 (706.2-715.3 Mb) [55], overlapped with the loci on chromosome 2AL (IWB22408, 709.8 Mb) in our study. In addition, the Ppo-A1-specific marker PPO18 [56] was also significantly associated with black point resistance (  [15,17,18]. As the genetics of black point reaction are still poorly understood, the remaining 18 loci identified on chromosomes 3A, 3B, 3D, 4B (2), 5A (2), 5B (2), 6B, 6D, 7A (4), 7B and 7D (2) represent potentially new resistance QTL ( Table 2); these may contribute to better understand of the architecture of black point reaction and provide more opportunities for resistance breeding. The above results demonstrated that GWAS was a powerful and reliable tool for identification of black point resistance genes.

Candidate genes for black point resistance
To identify candidate genes for black point resistance, the flanking sequences of SNP markers significantly associated with black point reaction were imported to Blast2Go software, and used as queries to BLAST against the National Center for Biotechnology Information (NCBI) and European Nucleotide Archive (ENA) databases; six candidate genes were identified (Table 3). Bioinformatics analysis indicated that SNP marker AX-111518195 on chromosome 2AL corresponded to peroxisomal biogenesis factor 2, an important gene for biosynthesis of peroxidase, which can accelerate oxidation of phenolic compounds to quinones and is crucial for phenolic metabolism and melanin synthesis [18,58]. In addition, the gene-specific marker PPO18 for Ppo-A1 [56] overlapping with the SNP loci on chromosome 2AL was also significantly associated with black point reaction. Fuerst et al. [18] reported that PPO catalyzes oxidation of phenolic compounds to melanins and quinines  [59]. Marker IWA5463 on chromosome 2AL corresponds to an F-box repeat protein, which may affect black point development by regulating signal transduction of gibberellin [59,60]. F-box proteins have also been implicated in response to various pathogens through targeting substrates in the degradation machinery [61]. Two SNP markers (AX-108951749 on 2B and IWA2223 on 5AL) encode serine/threonine-protein kinases, which trigger multiple physiological and biochemical reactions in response to abiotic and biotic stresses by mediating perception and transduction of external environmental signals [62,63]. We also identified a candidate gene encoding a disease resistance RPP8-like protein (AX-111053669 on chromosome 3A), which had been proposed to play an essential role in regulation of responses to a variety of external stimuli, including stress [64]. Bioinformatics analysis of trait-associated SNPs was proven to be an effective tool to find candidate genes for complex agronomic traits [34]. However, black point is a consequence of complicated biological processes and the mechanism of black point formation remains unclear; more detailed experimental analyses are needed to confirm the roles of candidate genes in black point resistance.

Application of MTAs for black point resistance in wheat breeding
It is difficult to select highly resistant lines at the early stages of a breeding program in the field due to the fact that black point symptoms can be assessed only on mature seed after harvest and are highly affected by environment. A significant additive effect was identified from the linear regression between black point resistance and the number of favorable alleles, indicating that pyramiding of favorable alleles will enhance resistance. Markers significantly associated with complex traits identified by GWAS or QTL mapping can be converted into kompetitive allele-specific PCR (KASP) markers for SNP validation, MAS and QTL fine mapping [65,66]. Semi-thermal asymmetric reverse PCR (STARP) also provides a new scalable, flexible and cost-effective approach for using SNP markers in MAS [67]. QTL with consistent effects across multiple environments should be useful for MAS [68]. Thirteen of the 25 loci identified in this study were detected in two or more environments and should be suitable for MAS. Some accessions with higher black point resistance and relatively high number of resistance alleles and excellent agronomic traits, such as Kitanokaori, Norin 67, Yumai 21, Yannong 19, Zhoumai 19, and Zhongmai 871 (Additional file 13: Table S7), should be good parental lines for breeding. Our follow-up studies will focus on validating the effects of these QTL and developing friendly, tightly linked markers that can be used in resistance breeding.

Conclusions
In the present study, a GWAS for black point resistance in a diversity panel was conducted with the 90 K and 660 K SNP arrays. Twenty-five resistance loci explained 7.9-18.0% of the phenotypic variations, demonstrating that GWAS can be used as a powerful and reliable tool for dissecting genes in wheat. The markers significantly associated with black point resistance and the accessions with a higher number of resistance alleles can be used as valuable markers and excellent parent material for resistance breeding. This study improves our understanding of the genetic architecture of black point resistance in common wheat.

Plant materials and field trials
The association panel used in the present study contained 166 diverse cultivars, comprising 144 accessions from the Yellow and Huai River Valley Facultative Wheat Region of China, and 22 accessions from five other countries, including Italy (9), Argentina (7), Japan (4), Australia (1) and Turkey (1) (Additional file 1: Table  S1). All accessions were grown at Anyang (35°12′N represent the genotype, genotype × environment interaction and residual error variances, respectively, and e and r were the numbers of environments and replicates per environment, respectively.

Genotyping and quality control
Total genomic DNA for SNP arrays was extracted from five bulked young leaves from each accession using a modified CTAB procedure [69]. to 12 were performed based on an admixture model. Each run was carried out with 100,000 recorded Markov-Chain iterations and 10,000 burn-in periods. An adhoc quantity statistic ΔK based on the rate of change in log probability of data between successive K values [71] was used to predict the real number of subpopulations. PCA and NJ trees were also used to validate population stratification with the software Tassel v5.0 [44] and PowerMarker v3.25 [70] (http://www.maizegenetics.net).

Linkage disequilibrium
LD among markers was calculated using the full matrix and sliding window options in Tassel v5.0 with 12,324 evenly distributed SNP markers. The positions of these markers were based on the physical map mentioned above. Pairwise LD was measured using squared allelefrequency correlations r 2 , and significance of pair-wise LD (P-values) was measured by Tassel v5.0 with 1000 permutations. The r 2 values were plotted against physical distance and a LOESS curve was fitted to the plot to show the association between LD decay and physical map distance. The critical value of r 2 beyond which the LD was likely to be caused by genetic linkage was determined by taking the 95th percentile in the distribution of r 2 of the selected loci [28]. The intersection of the fitted curve of r 2 values with this threshold was considered as the estimate of LD range.

Genome-wide association analysis
Associations between genotypic and phenotypic data were analyzed using the kinship matrix in a MLM by Tassel v5.0 to control background variation and eliminate the spurious MTAs. In MLM analysis, the kinship matrix (K matrix) was considered a random-effect factor, whereas the subpopulation data (Q matrix) was considered a fixed-effect factor [43]. The K matrix was calculated by the software Tassel v5.0 and the Q matrix was inferred by the program Structure v2.3.4. The P value determining whether a SNP marker was associated with the trait and the R 2 indicating the variation explained by the marker was recorded. The GWAS was also analyzed using the FarmCPU software [37] by R Language (https://www.r-project.org/). Bonferroni-Holm correction [72] for multiple testing (α = 0.05) was too conserved and no significant MTAs were detected with this criterion. Therefore, markers with an adjusted -log 10 (P-value) ≥ 3.0 were regarded as significant markers for black point reaction [73][74][75], as shown in Manhattan plots using the ggplot2 code in R Language. Important P value distributions (observed P values plotted against expected P values) were shown in Q-Q plots. We checked the LD (r 2 ) among markers significantly associated with black point reaction on the same chromosomes to compare the resistance loci. LD block on the same chromosome were computed and visualized by Haploview v4.2 [76] (www.broadinstitute.org/haploview/ haploview). To compare resistance loci identified in the present study with known genes/QTL, deletion bin information for SSR and SNP markers was obtained following [23].
The effect of favorable alleles on black point resistance Each locus comprises two alleles based on SNP marker a single base substitution, transition or transversion. Alleles with positive effects leading to higher black point resistance are referred to as "favorable alleles", and those leading to lower resistance are "unfavorable alleles". The representative SNPs at the resistance loci were used to count the frequencies of favorable and unfavorable alleles and their allelic effects were determined ( Table 2). Regression analysis between favorable, unfavorable alleles and black point scores were conducted using the line chart function in Microsoft Excel 2016.

In silico annotation of SNPs
To identify candidate genes or putative protein functions of SNP flanking-regions, the flanking sequences corresponding to the SNP markers significantly associated with black point resistance were used in BLASTn and BLASTx searches against ENA (http://www.ebi.ac.uk/ena) and NCBI (http://www.ncbi.nlm.nih.gov/) databases. Sequences were imported to Blast2Go software (https://www.blast2go.com/) in fasta formats that were blasted, mapped and annotated using standard parameters embedded in the software.