GWAS for plant growth stages and yield components in spring wheat (Triticum aestivum L.) harvested in three regions of Kazakhstan

Spring wheat is the largest agricultural crop grown in Kazakhstan with an annual sowing area of 12 million hectares in 2016. Annually, the country harvests around 15 million tons of high quality grain. Despite environmental stress factors it is predicted that the use of new technologies may lead to increases in productivity from current levels of 1.5 to up to 3 tons per hectare. One way of improving wheat productivity is by the application of new genomic oriented approaches in plant breeding projects. Genome wide association studies (GWAS) are emerging as powerful tools for the understanding of the inheritance of complex traits via utilization of high throughput genotyping technologies and phenotypic assessments of plant collections. In this study, phenotyping and genotyping data on 194 spring wheat accessions from Kazakhstan, Russia, Europe, and CIMMYT were assessed for the identification of marker-trait associations (MTA) of agronomic traits by using GWAS. Field trials in Northern, Central and Southern regions of Kazakhstan using 194 spring wheat accessions revealed strong correlations of yield with booting date, plant height, biomass, number of spikes per plant, and number of kernels per spike. The accessions from Europe and CIMMYT showed high breeding potential for Southern and Central regions of the country in comparison with the performance of the local varieties. The GGE biplot method, using average yield per plant, suggested a clear separation of accessions into their three breeding origins in relationship to the three environments in which they were evaluated. The genetic variation in the three groups of accessions was further studied using 3245 polymorphic SNP (single nucleotide polymorphism) markers. The application of Principal Coordinate analysis clearly grouped the 194 accessions into three clades according to their breeding origins. GWAS on data from nine field trials allowed the identification of 114 MTAs for 12 different agronomic traits. Field evaluation of foreign germplasm revealed its poor yield performance in Northern Kazakhstan, which is the main wheat growing region in the country. However, it was found that EU and CIMMYT germplasm has high breeding potential to improve yield performance in Central and Southern regions. The use of Principal Coordinate analysis clearly separated the panel into three distinct groups according to their breeding origin. GWAS based on use of the TASSEL 5.0 package allowed the identification of 114 MTAs for twelve agronomic traits. The study identifies a network of key genes for improvement of yield productivity in wheat growing regions of Kazakhstan.


Background
Hexaploid bread wheat (Triticum aestivum L.) is a major commodity for export in Kazakhstan and is grown annually on more than 12 million hectares. The history of wheat cultivation in Kazakhstan shows that most of wheat cultivars have been developed in collaboration with Russian breeders and using Russian wheat genetic resources [1]. Even after the breakup of the USSR, this trend is still in the place as the two countries share their expertise and genetic resources based on bilateral projects and international activities under the CIMMYT umbrella [2,3]. It is also found that Kazakh-Russian wheat germplasm is genetically close to wheats from the USA [4]. It is hypothesized that the heavy importation of this crop from Russia at the end of the nineteenth century after the successful introduction of Turkish Red Wheat types to the US by Russian Mennonites [5] might be a main reason behind the close genetic relationship of Kazakh and US accessions [4].
Over 80% of the wheat harvesting area in Kazakhstan is grown with spring type and it is cultivated at higher latitudes in parts of the country, including Northern and North-Eastern Kazakhstan. The other important growing regions stretch along the Tian-Shan mountain chain in Southern and South-Eastern parts, where both winter and spring types are grown successfully. The climatic conditions in these regions are very variable, as are the soil types, the temperature during the growing season, the precipitation levels, and the photoperiod length [6]. Therefore, studies of yield performance in different ecological niches are important for strategies in current and future breeding activities across wheat growing regions of the country. It is projected that the improvement of agronomy and use of new breeding methods in this country may lead to the development of new varieties, and, consequently, improve the yield productivity up to 3 tons per hectares [7]. In the past, plant breeders successfully relied on using conventional tools and methodologies. Nowadays, the availability of new genomic tools and resources is leading to new opportunities to dissect the genetic mechanisms of complex traits associated with yield improvement [8].
As costs for high throughput genotyping are decreasing, genome-wide association studies (GWAS) are becoming a powerful approach for the detection of QTL (quantitative trait loci) associated with wheat agronomic traits, with the final goal of accelerating local breeding activities based on the application of markerassistant selection [9,10]. The success of GWAS in wheat is largely based on the development of highdensity SNP genotyping platforms by Affymetrix [11,12] and Illumina [9,13], which are now providing rich resources for high-throughput genotyping data for wheat diversity panels. In GWAS, detection of significant associations relies primarily on genetic marker coverage, the number of individuals studied, and linkage disequilibrium (LD) between causative and linked polymorphisms [14,15]. Currently, GWAS has been used successful in hexaploid wheat for identification of QTL for yield components [16][17][18], abiotic stress resistance [19][20][21] disease resistance [22,23], and grain quality [24,25]. A survey of the literature shows that GWAS is actively applied in wheat studies in many different parts of the World, including North America [26], Central America [25], Europe [18,24], Africa [19], Australia [22], and Asia [20].
Although GWAS has proven to be a very efficient approach for capturing important marker-trait associations (MTA), results reported from studies in different regions of the World are revealing the tendency for a strong influence of the growth environment in which yield QTL are identified with significant genotype x environment interaction revealed (GEI). For instance, results obtained from three different GWAS studies related to identification of QTL for yield performance in Europe [24], India [27], and Mexico [16] showed different responses and QTL for yield components in different parts of the genome. This trend is also confirmed in studies when the same germplasm was tested in different regions of Asia [20]. This outcome is congruent with result reported by Quarrie et al. (2005) from studies using bi-parental mapping populations [28], and can be explained by the sensitivity to environmental factors at crucial growth phases which determines the potential number of grains per ear [29]. Therefore, the success of regional projects may largely depend on separate, local, GWAS experiments using genotyped adapted germplasm. The main goal of this work was GWAS using spring wheat accessions from Kazakhstan, Russia, Europe, and CIMMYT (Mexico) for identification of MTA in field trials in three diverse environments of Kazakhstan. The study is the first attempt to employ GWAS for identification of important QTL and enhancing of spring wheat breeding projects in this county.

Methods
The spring wheat panel consisted of 96 commercial and prospective cultivars from Kazakhstan and the Russian Federation, 38 cultivars from Europe, and 60 CIMCOG (CIMMYT Mexico Core Germplasm) lines (CIMMYT, Mexico) (Additional file 1). Currently, 61 cultivars from Kazakhstan and Russia in this genetic panel have been registered through the State Seed Trials Commission of the Republic of Kazakhstan (2015), and are grown officially in Kazakhstan. The panel also included 29 prospective cultivars developed in Kazakhstan and Russia (Additional file 1). The European cultivar collection predominantly comprised accessions originating in the United Kingdom. The CIMCOG lines are a special population developed for studying opportunities for improvements in photosynthesis and biomass [30]. The field trials were conducted in three different latitude regions of Kazakhstan (Additional file 2), specifically at the Karabalyk breeding station (Northern Kazakhstan), DNA samples were extracted and purified from single seeds of individual cultivars using commercial kits (Qiagene, CA, USA). The DNA concentration for each sample was adjusted to 50 ng/μl. Accessions were genotyped using the wheat 90 K Illumina iSelect SNP array as described in [4].
Statistical analyses of data, including multiple factor ANOVA, Pearson's correlation and t-test were calculated using the software package GraphPad Prism 5.0 [31]. GGE Biplot methods were employed by using the GenStat package (17th release, VSN International, Hertfordshire, UK). The symmetric scaling option of both methods and available field data for all three sites were used in estimations.
GWAS analysis of QTL governing plant growth stages and yield parameters in the set of 194 accessions was performed with the TASSEL 5.0 package [32]. For this, the SNP dataset was filtered using a 10% cutoff for missing data and only markers with a minor allele frequency ≥ 0.10 were considered for GWAS. The STRUCTURE and STRUCTURE HARVESTER [29] programs were used for the development of delta K values (ΔK) and Q-matrix for identified clusters.

Results
Field performance of the spring wheat collection in three regions of Kazakhstan Data on field performance of the 194 spring wheat accessions from Europe (EU), CIMMYT (CIMCOG lines), and lines from Kazakhstan and the Russian Federation (Additional file 1) were analyzed at the experimental stations of Northern, Central and Southern regions of Kazakhstan during the 2013-2015 seasons. The length of the plant growth phases, and means of yield components in the collection from the three breeding origins were significantly different among the regions (Fig. 1). While heading time of samples from all three regions were earlier in Northern and Central regions in comparison to South Kazakhstan (Fig. 1a), maturation time length was exactly opposite (Fig.1b). Yield components, including the number of kernels per spike (NKS), was always higher in the Southern breeding station (Fig. 1c). Average YPP (2013-2015) for the three groups was not correlated across the three regions, except CIMCOG lines in the Southern region, which were significantly correlated with local and EU cultivars in the Central region (Additional file 3). In Northern Kazakhstan, which is the most important wheat growing region of the country, the yield was lowest of the sites during all the years studied. The Pearson's correlation test suggested that leading contributing factors to the YPP in Northern Kazakhstan were BD, PH, BPP, NFS, and NKS (Additional file 4). In total, 10 different observations were measured for plant growth phases in the North, Center, and South of Kazakhstan, including TT-H and TT-M dates. The Pearson correlation index suggested that in all three regions TT-H and TT-M exerted a highly significant influence on yield components, including the number of kernels per spike (NKS) and thousand grain weight (TKW) (Additional file 4).
The two developmental phases of growth, TT-H and TT-M, in this study differed among breeding origins (O effect in Table 1 and Fig. 1), and showed a significant interaction with breeding origin within the nine environments (O x R x Y) and places of growth (O x R).
The magnitude of the main effects for yield (Year, Region, and Origin), and their interactions were ranked region > origin > year > Y x R > Y x O x R > O x R, and the least effect was Y x O, as indicated by the F-values ( Table 1). The only case where the origin effect was greater over the regional effect was the result received from the PH, where the F-value for the origin effect was 1.7 times higher than for the region effect.
The GGE biplot analysis, based on yield performance in the nine environments, is separated into groups with different breeding origins both by region and year effects (Fig. 2). In the analysis of the regional effect the biplot indicates that the South and North of Kazakhstan are more suitable for accessions from Kazakhstan and the Russian Federation, while Central Kazakhstan is more favorable for accessions from Europe (Fig. 2a). In the analysis of the year effect, the group of wheat accessions from Kazakhstan and the Russian Federation was well matched to the environments of 2013 and 2015, while in 2014 was more favorable for wheat accessions from Europe (Fig. 2b). Although the three groups of accessions with different breeding origins were well separated in both scans, the separation of environment effects was different. In the case of the regional effect, both principal coordinates were efficiently discriminating the 2 mega-environments, but in the case of the year effect the separation of the 2 mega-environments was largely based on the second principal coordinate (PC2).

Genetic variation of spring wheat with different breeding origins based on SNP data
The 194 spring wheat accessions from Kazakhstan, Russia, and Western Europe were genotyped using the   [33] allowing the chromosomal location of the markers. The genotyping data for CIMCOG lines was freely available at [34] and generated by using the 35 K Axiom® array [12] [32]. From these data, the genetic variability amongst the 194 spring wheat accessions, including the 60 CIMCOG lines, was studied based on 3245 polymorphic SNP markers. The number of polymorphic markers was reduced after alignment of the three groups of accessions, whereby the smallest datasets were for the CIMCOG group (3852 SNP markers). The total length of all wheat chromosomes in the SNP map was 3109.9 cM. The average SNP density was 0.96 SNP/cM and ranged from 0.68 in the B genome to 2.31 in the D genome. The range of Nei's unbiased diversity index in the three groups of accessions varied from 0,247 in the CIMCOG lines to 0,339 in the mixed group of Kazakhstan and Russian accessions ( Table 2). The diversity index for the genomes A, B, and D were 0.286, 0.284, and 0.269, respectively. The D genome had the largest LD (r 2 0.1) blocks (26. The Principal Coordinate analysis revealed the separation of the accessions into the three distinct subgroups according to their breeding origin (Fig. 3). The first coordinate (47.34%) clearly separated Kazakhstan and Russia samples from CIMCOG lines, and the second coordinate (20.41%) separated European accessions from the other two groups (Fig. 3a). This result was congruent with the outcome from the STRUCTURE analysis where the accessions in the three clades were separated according to their breeding origin (Fig. 3b).

Identification of SNP markers for growth stages and yield components based on GWAS
In this study, the genotyping HapMap file consisted of 194 spring wheat accessions and 3245 polymorphic SNP markers. The set of polymorphic SNP data was prepared after filtering with a 10% of cutoff for missing data and markers with minor allele frequencies ≥ 0.10. After running Structure Harvester, a Q-matrix for the three identified clusters was selected for further analysis based on analysis of delta K value (ΔK). The genotyping set was analyzed separately for each of nine studied environments in Northern, Central, and Southern Kazakhstan.
In total, 114 MTAs were identified on 19 chromosomes of the wheat genome (Fig. 4., Additional file 6) and the largest number of associated markers was detected on chromosome 4B (12 MTAs). Manhattan and QQ (quintile -quintile) plots of identified MTAs in the North, Center and South regions are given in Additional files 7, 8, and 9, respectively. Forty six MTAs were identified for traits related to length of the growing stages, 68 MTAs were identified for morphological traits and yield components,  and six markers showed significance for both groups of traits (Additional file 6). Twelve QTL were simultaneously identified in two environmental sites within the region. Only two MTAs for PL mapped on the 1B and 7D chromosomes (AX-94532960 and AX-94539237) were significant for environmental sites among regions. Among MTAs for agronomic traits, 21 associations were identified for NFS, four for NKS, and seven for TGW (Additional file 6).

Discussion
To facilitate the discovery of MTAs, three groups of wheat accessions with different breeding origins were studied. In genetic terms, the accessions were clearly separated into the three subgroups, with the first subgroup (Russia/Kazakhstan) being the most diverse, and the third subgroup (CIMCOG) being the least diverse group in the analysis. The separation of accessions into the three subgroups was also congruent with population structure analysis using on STRUCTURE software. Yield performance during the 3 years was lowest in Northern Kazakhstan. Yield performance in the Southern region was always higher. However, the size of arable land in this region is limited and insignificant in comparison with the wheat growing area in Northern Kazakhstan. Results suggest that accessions in the EU collection can successfully be used in the Central region for obtaining better yield, and selected CIMCOG lines can be efficiently used for improvement of TKW in all three regions. The GGE biplot based on yield data helps to confirm that groups within the three breeding origins have different genetic backgrounds, and local accessions are well adapted to Northern and Southern regions.
Incorporation of the CIMCOG lines in the analysis resulted in a reduction of polymorphic SNP markers available for GWAS, as genotyping data for those lines was restricted to those shared between the Axiom and Illumina SNP arrays. Therefore, only 3245 aligned polymorphic SNPs were used in the GWAS with 1340 SNPs positioned in the A genome, 1448 in the B genome, and 457 in the D genome. The Diversity index was relatively high in the A and B genomes and lower in the D genome, which is well in agreement with previous observations [10,18]. GWAS was performed separately for nine field trials over 3 years and identified 114 MTAs (Additional file 6). Only 12 of the identified MTA were significant in two environments both within and between regions (Additional file 6), suggesting that stability of the associations was undermined by a strong influence of environmental factors.
A comparison of the MTAs detected in this study with those reported in other publications indicates a number of similarities. For instance, the Rht-D1 gene is known to be located at 31.5 cM on chromosome 4B [35] and close to AX-94592612 (39.0 cM), which in this study is associated with plant height, days to booting and number of fertile spikes. Other example of similarities are the locations of MTAs for peduncle length (20.6 cM), plant height and spike length (both at 28.1 cM) (Table 3), and the position of the Rht8 gene on chromosome 2D (23.0 cM) [36]. A MTA for NKS on chromosome 6A (2.7 cM) in this study was positioned at a similar location as in the study by Guo et al. [24]. MTAs for PH on chromosome 4B (AX-94592612 and AX-95117055) have a similar position to the Rht-B1 gene for plant height identified using a bi-parental mapping population tested in South-east of Kazakhstan [28]. In the same study by Quarrie et al. [28], QTL for NKS (chromosomes 1A), TKW (1D, 3A, and 4B), and NFS (1A, 4B, 5D, and 7A) were detected in similar positions for MTAs of NKS, TKW and NFS found in this study (Table 3).
Despite similarities in the genetic positions of MTAs identified with other studies, a number of MTAs from this study were missing in GWAS conducted in different geographic regions. This study revealed four MTAs related to NKS, which is one of the major yield components in wheat [28]. Surveys of the literature suggests that the loci identified on chromosomes 3B (56.9 cM) and 5B (144.1 cM) were not identified in previous GWAS studies in other regions of the World [17,25,28] nor in Quarrie et al. [28]. Therefore, it was hypothesized that these MTAs, along with other associations shown in Table 3, are new MTAs.
Recently, there have been a number of discussions related to the importance of size and level of genetic variation in diversity panels for the success of GWAS projects [14,15,37]. It was pointed out that experiments with less than 384 accessions [14] and large LD blocks [15] may lead to the identification of false positive associations. The study by Turner et al. (2016) indicated that smaller panels may allow the detection of false negative  associations that would not have been detected in the larger panels [37]. Results in this study using a relatively small panel (n = 194) are largely in agreement with the study by Turner et al. [37].