The carotenoid biosynthetic and catabolic genes in wheat and their association with yellow pigments

Background In plants carotenoids play an important role in the photosynthetic process and photo-oxidative protection, and are the substrate for the synthesis of abscisic acid and strigolactones. In addition to their protective role as antioxidants and precursors of vitamin A, in wheat carotenoids are important as they influence the colour (whiteness vs. yellowness) of the grain. Understanding the genetic basis of grain yellow pigments, and identifying associated markers provide the basis for improving wheat quality by molecular breeding. Results Twenty-four candidate genes involved in the biosynthesis and catabolism of carotenoid compounds have been identified in wheat by comparative genomics. Single nucleotide polymorphisms (SNPs) found in the coding sequences of 19 candidate genes allowed their chromosomal location and accurate map position on two reference consensus maps to be determined. The genome-wide association study based on genotyping a tetraploid wheat collection with 81,587 gene-associated SNPs validated quantitative trait loci (QTLs) previously detected in biparental populations and discovered new QTLs for grain colour-related traits. Ten carotenoid genes mapped in chromosome regions underlying pigment content QTLs indicating possible functional relationships between candidate genes and the trait. Conclusions The availability of linked, candidate gene-based markers can facilitate breeding wheat cultivars with desirable levels of carotenoids. Identifying QTLs linked to carotenoid pigmentation can contribute to understanding genes underlying carotenoid accumulation in the wheat kernels. Together these outputs can be combined to exploit the genetic variability of colour-related traits for the nutritional and commercial improvement of wheat products. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3395-6) contains supplementary material, which is available to authorized users.


Background
Carotenoids are organic pigments commonly present in plants, photosynthetic algae and some species of fungi and bacteria. They are normally associated with thylakoid membranes of chloroplasts and often provide the yellow, orange and red pigmentation to many flowers, fruits and roots [1]. In plants, carotenoids play an important role in photosynthesis, photo-oxidative protection [2], and represent the substrate for the synthesis of apocarotenoid hormones, such as abscisic acid and strigolactones [3,4]. Carotenoid actions and their relation to human health and disease have been widely reviewed [5]. Carotenoids and some of their metabolites are suggested to play a protective role in a number of reactive oxygen species (ROS)-mediated conditions, such as, i.e., cardiovascular diseases, several types of cancer or neurological, as well as photosensitive or eye-related disorders.
Carotenoids are typically divided into two classes: carotenes, which are tetraterpenoid hydrocarbons, and xanthophylls that contain one or more oxygen groups [6]. The carotenoid biosynthesis has been almost completely elucidated due to work in Arabidopsis thaliana, rice, maize and in some ornamental plants [6,7]. Briefly, the first stage of the biosynthetic process, mediated by phytoene synthase (PSY), involves the condensation of two molecules of geranylgeranyl diphosphate to produce phytoene, which normally does not accumulate in tissues (Fig. 1). In higher plants, the phytoene undergoes a series of four desaturation reactions, mediated by phytoene desaturase (PDS), zeta-carotene isomerase (Z-ISO), zeta-carotene desaturase (ZDS) and carotenoid isomerase (CRTISO) that lead to the production of lycopene. Double lycopene cyclization can produce α-carotene (branch β-ε) or β-carotene (branch β-β). Subsequent modifications transform α-carotene to zeinoxanthin and lutein, and the β-carotene to β-cryptoxanthin, zeaxanthin, antheraxanthin, violaxanthin and neoxanthin. The oxidative cleavage of violaxanthin and neoxanthin form xanthoxin, which is converted to the phytohormone abscisic acid via ABA-aldehide [3]. Strigolactones derive from β-carotenoids via a pathway involving the carotenoid cleavage dioxygenases CCD7, CCD8 and CYP711A1 [4].
Wheat is one of the most important crops worldwide and is the leading source of plant protein in human food, having a higher protein content than other major cereals, such as maize or rice [8]. In addition to their protective role as antioxidant and as precursors of vitamin A, carotenoids are commercially important as they confer whiteness vs. yellowness degree to the end products of wheat. Consumers usually prefer white bread made from common wheat (Triticum aestivum L. subsp. aestivum), while yellow semolina and pasta made from durum wheat (Triticum turgidum L. subsp. durum) are preferred by the market. Flour and semolina colour is mainly the result of carotenoid accumulation in the grain [9], but the final colour of end-finished products is also associated to losses during grain storage and to the carotenoid oxidative degradation by enzymes, such Fig. 1 The carotenoid metabolic/catabolic pathway (modified from Vranova [57]) as polyphenol oxidase, lipoxygenase and peroxidase, during processing [10,11].
Flour and semolina colour in wheat is a quantitative trait controlled by several genes with additive effect, and influenced by environmental factors [12]. Mapping studies for yellow pigment content (YPC) and yellow index (YI), in several biparental populations have identified QTLs on all wheat chromosomes (reviewed in Additional file 1: Table S1). The major QTL on the long arm of chromosome 7A, accounting for up to 60% of the phenotypic variation, was detected through all studies and attributed to allelic variations of the phytoene synthase (Psy-A1) gene [13][14][15]. Although there is an increased understanding of the mechanisms regulating carotenoid content and composition, only some carotenoid biosynthetic genes have been identified and cloned in wheat, such as phytoene synthase (PSY) [13,16,17], lycopene ε-cyclase (LYCE) [18,19], carotene desaturase (PDS) and zetacarotene desaturase (ZDS) [20], carotenoid β-hydroxylase (BCH) [21], lycopene β-cyclase (LYCB) [22].
As an alternative to classical linkage-based QTL mapping, the association mapping approach has received increased attention for detecting QTLs controlling complex traits [23]. One of the potential disadvantages of genome-wide association studies (GWAS) is the appearance of spurious marker-trait associations (false-positive associations) resulting from population structure and multiple testing of thousands of markers [24,25]. Association mapping can be simplified for some traits by the "candidate gene approach", that is testing SNPs within a candidate gene for a significant association with the trait [26].
The objectives of the current study were to: a) identify candidate carotenoid metabolic/catabolic genes in wheat by exploiting genomic resources and SNPs detected within the coding sequences of candidate genes; b) provide the precise map position of candidate genes on high-density SNP-based consensus maps; c) identify the genetic loci controlling yellow pigments by GWAS and candidate gene approaches using a tetraploid wheat collection coupled with the 90 K iSelect SNP genotyping array. The identification of genetic loci controlling yellow pigment accumulation/degradation will provide information on the genetic resources available to breeders to improve commercial and nutritional properties of wheat products, as well as the opportunity to develop functionally associated markers to be used in markerassisted selection (MAS).

Identification of carotenoid biosynthetic and catabolic genes of wheat
The A. thaliana isoprenoid pathways and respective genes from AtIPD (http://www.atipd.ethz.ch/) were used to identify and download the Arabidopsis gene sequences from the TAIR database (http://arabidopsis.org/). In order to isolate the wheat carotenoid sequences, the 24 cDNAs corresponding to all identified genes from A. thaliana database were used as query to extract sequences of T. aestivum and of the monocots Brachypodium distachyon, O. sativa and Zea mays ( Table 1). The in silico analysis highlighted a lack of uniformity for acronyms and gene names/classifications used in literature between different plant species (e.g. the carotenoid β-ring hydroxylases is named BCH in Arabidopsis, CRTR-B or HYD in maize, and BCH or HYD in rice, Brachypodium and wheat). For simplicity, we used the gene nomenclature of A. thaliana, whose isoprenoid genes have been well characterized and reported in public metabolic pathway databases.
The bootstrapped molecular phylogenetic tree (Fig. 2), based on 119 carotenoid cDNAs which correspond to orthologous sequences of the above-mentioned five plant species showed clear clustering of the orthologs by gene family. Additionally this analysis showed that these carotenoid genes are generally highly conserved between species, with the minimum sequence similarity being between Arabidopsis and Brachypodium for NXS (70%), and the maximum similarity observed between Brachypodium and rice for CYP97C1 (89%). Sequence similarity helped to assign putative function to the identified wheat EST sequences. Table 1 lists the genebank entries of the carotenoid pathway genes of Arabidopsis, Brachypodium, rice, maize and wheat. The PSY gene family is tightly clustered based on the three paralogous genes, annotated as PSY1, PSY2 and PSY3, while in eudicots only the presence of PSY1 and PSY2 homologs have been reported [17,27]. The BCH characterization present in literature [21] was confirmed by the phylogenetic tree: Ta_BCH1 clustered with Zm_BCH2, Os_BCH2 and Bd_BCH2, while Ta_BCH2 gene grouped with Os_BCH1.
The in silico gene expression analysis, using data from the publicly available Wheat 61 k GeneChip, revealed variation in transcription patterns for these carotenoid genes in a wide range of tissues and developmental stages in wheat (Additional file 2: Figure S1). Exploiting the PLEXdb database, the expression data was investigated to predict the genes' impact on the final carotenoid content. In general, all carotenoid genes were found to be expressed to some degree during all developmental stages, with minimum expression levels of 3.53 and 4.51 RMA normalization for Z-ISO and CCD7, respectively, and maximum levels of 12.55 RMA normalization for ZDS. In particular, PSY1, PSY2, PDS, ZDS, LYCB, CYP97C1, CCD1, VDE, ZEP and NCED4 showed elevated expression levels (values higher than the mean values ± 2 SD) in seedling leaf (phase 6) while LYCE, BCH1 and BCH2 genes exhibited high level of transcripts in anthers before anthesis (phase 10). AAO3 showed higher levels of expression in reproductive tissues including immature pistil before anthesis. ABA2 showed the highest expression during the caryopsis-embryo-endosperm growth (phase 11 to 13). Low expression values (mean values ± 2 SD) were detected for LYCE in roots, CYP97C1 in anthers before anthesis, CCD8 in 22 DAP endosperm stage and NXS in floral bracts before anthesis.
After the phylogenetic analysis, a BLASTn analysis (based on percentage identity) was performed between the 24 wheat carotenoid genes and the entire wheat SNP dataset [28], which provides a marker coverage of about 85% of the genome. A total of 75 SNP markers corresponding to the 19 carotenoid gene sequences were identified, with several genes containing multiple SNPs ( Table 2). No SNP markers were identified within the Z- ISO, CCD7, CCD8, CYP711A1 and NXS genes. Twentytwo and 32 SNP markers were located on the consensus durum [29] and bread wheat maps [28], respectively. This enabled us to assign genes to chromosomes groups; the CRTISO genes were mapped on chromosome group 1; BCH1 and VDE on homoeologous chromosome arms 2 L; LCYE on group 3; PDS on group 4; PSY2, PSY3, CCD1 and ABA2 on group 5; LUT5 on group 6; PSY1 and AAO3 on chromosome arms 7 L.

Phenotypic variation for yellow pigment content and yellow index
The tetraploid wheat collection, including 233 accessions of modern and old durum cultivars, durum landraces,    Table 2 Chromosome localization of the identified wheat carotenoid biosynthetic/catabolic genes on the durum [29] and bread wheat [28] consensus maps and allele frequency in the tetraploid wheat collection of 233 genotypes (Continued)  Table 2 Chromosome localization of the identified wheat carotenoid biosynthetic/catabolic genes on the durum [29] and bread wheat [28] consensus maps and allele frequency in the tetraploid wheat collection of 233 genotypes (Continued) domesticated and wild tetraploid wheat accessions, was evaluated for yellow index (YI) in six environments, and for yellow pigment content (YPC) in two environments. The analysis of variance showed highly significant differences among genotypes in each environment; environments, genotypes and environment x genotype interaction were significant in the combined analysis across environments (not shown). Mean, range, and heritability estimates (h B 2 ) for YPC and YI of the whole collection, and of the durum wheat sub-population in each trial are reported in Table 3. A normal frequency distribution (Additional file 3: Figure S2) was observed for both traits. Mean values of YI of the whole collection varied from 12.8 (F09) to 14.6 (V10), while mean values of the durum subpopulation ranged from 13.3 (F09) to 15.3 (V10). The phenotypic variation in the whole collection (from 9.1 to 17.8) and in the durum sub-population (11.6-17.8) suggested that alleles for low and high YI were present in the T. turgidum subset of the collection. YPC in the whole collection ranged between 3.2 and 11.7 μg/g at F08, and between 2.4 and 12.6 μg/g at V09, with average values of 6.3 and 5.8 μg/g, respectively. The durum sub-population showed higher mean values than the whole collection. This would indicate that in recent decades durum wheat breeders have paid special attention to the selection of new cultivars with grain colour that will be of higher (commercial) value [30].
Broad-sense heritability in the whole collection ranged from 0.89 to 0.94 for YI, and from 0.91 to 0.95 for YPC. The high heritability values and the correlation coefficients among environments for YI and YPC (Tables 4  and Additional file 4: Table S2) indicated that both traits were stable, and that the phenotypic expression was mainly due to genotypic effects. Highly significant (0.001P) and positive correlation (r = 0.89) was observed between YPC and YI mean values across environments.

Association of carotenoid genes to yellow pigments
Out of 24 carotenoid candidate genes, 17 showed no SNPs in the coding sequences, failed in the array analysis, or had an allele frequency lower than 0.10 (Table 2) in the wheat collection. These genes were therefore removed from the Marker Trait Association (MTA) analysis. Seven candidate genes (PSY1, PSY2, BCH1, CYP97A3, VDE, ABA2 and AAO3) had between 1 to 5 SNPs, and a linear regression analysis was carried out between each SNP, and YPC and YI (Table 4). Except for BCH1 on 2BL, one or more SNPs of each candidate gene mapped onto one or both homeologous chromosomes were found to be significantly associated to YI, indicating their involvement in the yellow pigment biosynthesis or catabolism. PSY1, BCH1, CYP97A3, VDE and ABA2 were also significantly associated to YPC. The phenotypic variation (R 2 ) explained by each of these markers varied from 5.9 to 16.3% for YI and from 7.4 to 14.8% for YPC. The estimated allelic effects for each marker ranged from −1.34 to 1.79 units for YI, and from 1.25 to 1.97 μg/g for YPC.

Detection of QTLs by GWAS
The wheat collection had been genotyped using the 90 K iSelect array. After excluding SNPs on the basis described in the methods, 13,639 SNPs in the whole collection and 9,863 SNPs in the durum sub-population were used for the association analysis. All of these SNPs have locations on the durum consensus map [29]. MTAs were initially calculated by linear regression analysis (GLM) and by three more statistical models (GLM + PCs, MLM + K, MLM + K + PCs) Table 3 Mean, range of variation, standard deviation (SD), coefficient of variation (CV) and heritability (h 2 B ) in the whole collection and in the durum sub-population evaluated for yellow index (b*) and yellow pigment content (μg/g) in six and two environments, respectively taking into account the confounding effects of population structure and relative kinship to minimize the occurrence of false-positive associations. In general, unsurprisingly the number of significant MTAs with GLM and GLM + PCs was much higher than with MLM + K and MLM + K + PCs (Additional file 5: Table S3). The strong deviation of the observed -log 10 (P) values from the expected distribution (see Q-Q plots in Additional file 6: Figure S3) and the high number of significant MTAs clearly indicated the detection of numerous false-positives by GLM and GLM + PCs models. Observed P values were closer to expected distribution incorporating the K matrix only or the K matrix and the PCs into a MLM, providing more confidence in the associations for YI and YPC detected using this model. The MLM + K and MLM + K + PCs models gave similar results; to minimize possible false-positives we decided to focus on the results generated by the MLM + K + PCs model. GWAS based on mean values of YI across environments detected nine significant QTLs in the whole collection, and five QTLs in the durum sub-population ( Table 5). The QTLs identified in the analysis of the whole population were on chromosomes 4A, 4B (two), 5B, 7A (four) and 7B. The QTLs identified in the durum sub-population were on 4B (two) and 7A (three). Four QTLs (two on 4B and two on 7A) were identical in both analysis (the whole collection and in the durum subpopulation). Out of nine significant QTLs for YI across environments, the QTL on 7A at 102.3 cM fulfilled the more stringent FDR criteria. The phenotypic variation (R 2 ) for each of these markers varied from 4.8 to 6.1% in the whole collection and from 10.1 to 18.4% in the durum sub-population. The estimated allelic effects for each marker ranged from −1.25 to 1.33 units.
GWAS based on mean values of YPC over two environments (Table 6) detected three significant QTLs on chromosomes 4B (one) and 7A (two) both in the whole collection and in the durum sub-population, and one additional QTL on 4B (position 43.9 cM) in the durum sub-population. The QTL on 7A associated to the SNP marker IWB49295 located in the Psy-A1 coding sequence was consistent in both the whole collection and the durum sub-population. Out of four significant QTLs for YPC across environments in the durum subpopulation, the QTL on 7A at 102.3 cM passed the FDR criteria. The phenotypic variation (R 2 ) explained for each of these markers varied from 5.3 to 22.1%, while the allelic effects for YPC ranged from −1.90 to 1.79 μg/g.
To investigate the environmental variations on detection of significant QTLs by GWAS, the MTA analysis was carried out on the mean value over replicates for each of the six environments for YI and for each of the two environments for YPC (Tables 5 and 6). A high QTL-to-environment variation was observed for both traits as we identified 17 QTLs specific in single environments vs. common QTLs across environments. Considering the GWAS for YI in the whole collection, a minimum of 5 QTLs were detected at V12 and a maximum of 11 QTLs at V10. Eleven different QTLs were only identified in one environment, 7 in two environments, 4 in three environments, 1 in four environments and only 1 in five environments. Notably, no QTL was   [29] and -log10(P) value are reported for each marker in each environment and in the mean of the environments. Phenotypic variation (R 2 ) and additive effect are reported only for markers significant in the mean of all two environments   [29] and -log10(p) values are reported for each marker in each environment and in the mean of the environments. Phenotypic variation (R 2 ) and additive effect are reported only for markers significant in the mean of all six environments detected in all six environments. Genotype x environment (QTL x E) interaction was lower in the durum sub-population: 2 QTLs were detected in two environments, 3 in three environments, 1 in four environments and 1 in all six environments. The same trend was observed for YPC: 5 QTLs were identified in only one environment and 1 in both examined environments in the whole collection; out of 4 QTLs detected in the durum sub-population, 3 QTLs were consistent in one environment and 1 in both environments.

Discussion
Identification and mapping of carotenoid genes in the wheat genome The carotenoid biosynthetic pathway has been extensively studied in model plants and crop species due to their important roles in both development and photosynthesis [2], and their beneficial effects on human health [5]. The wheat genome has still not been completely sequenced due to its huge size and complexity, and the knowledge of metabolic and catabolic pathway of carotenoid compounds remains incomplete. Comparative genomic analysis across different taxa allowed to transfer functional information from wellcharacterized model organisms, such as Arabidopsis, rice and Brachypodium, to another less-studied taxon, like wheat. This has been beneficial for BCH1, BCH2, CYP97C1, CCD7, CCD1, NCED9 and CCD7 genes, many of which have been well characterized in rice, Brachypodium and Arabidopsis, but few of which have been studied in wheat. All the orthologues clustered by gene on the phylogenetic tree, sharing common conserved motifs in cDNA sequences. Unsuprisingly, the phylogenetic analysis revealed that the dicotyledonous PSY1 and PSY2 groups were more distantly related to those of the monocotyledonous groups, thus supporting the assumption that a single duplication event of the ancestor genes occurred before the divergence of the grass subfamilies [17,27]. Differential duplication events took place in the BCH clade. A separation of the Arabidopsis BCH paralogs suggested the same time frame as the other genes for functional diversification [21], but an unexpected separation occurred prior to the main grass subfamily divergence for rice BCH1. Further studies on the gene structure and intron-exon size facilitate a better understanding of the BCH group. The in silico expression analysis of the carotenoid candidate genes included in the present study in a wide range of tissues and developmental stages showed that many of these genes had similar expression profiles. Additionally we observed that sometimes one or more genes were virtually unexpressed (such as Z-ISO and CCD7) or highly expressed (such as ZDS) in all the thirteen tissues/stages (Additional file 2: Figure S1). LYCE, BCH1, BCH2, CYP97A3 and ABA genes exhibited high expression levels in the anthers prior to anthesis and in kernel tissues, indicating their potential involvement in kernel carotenoids accumulation.
With the objectives of both characterizing the carotenoid genes and investigating their relationships with the amber colour of grain and flour of wheat, we analyzed a tetraploid wheat collection with the recently developed genotyping array including 81,587 gene-associated SNPs [28]. The BLASTn analysis of the entire SNP dataset against the carotenoid gene sequences allowed to identifying 1-7 SNPs in the coding sequences of 19 out of 24 examined carotenoid candidate genes (Table 1). In many cases, at least one SNP was identified for each of the three homeologous genes present in the wheat genomes (PSY1, PSY2, PDS, ZDS, LYCE, CYP97A3, CCD1, ABA2 and AAO3). The recent availability of the high-resolution consensus map of durum [29] and common wheat [28] allowed us to determine the precise map position of most of the carotenoid genes (Table 1 and Fig. 3). The chromosomal location of 13 carotenoid genes determined by our strategy was consistent with results reported by Crawford and Francki (2013) [19], who identified the chromosomal locations based on survey sequence from the International Wheat Genome Sequencing Consortium (http://www.wheatgenome.org/). Map positions of a few carotenoid gene are reported in chromosome intervals as long as 5-20 cM in different SSR-based maps, such as PSY1 and PSY2 [16] and LYCE [31]. The carotenoid genes are distributed on 14 of the 21 chromosomes of bread wheat, and the identification of functional markers and map position can be particularly useful for breeders in MAS programs.

Association of carotenoid genes to yellow pigments
The allele frequency of SNP markers corresponding to carotenoid genes were found to be very variable in the examined wheat collection ( Table 2). Several of these SNPs were either monomorphic, or had a MAF < 10% and therefore considered to be rare alleles. PSY1, PSY2, BCH1, CYP97A3, VDE, ABA2 and AAO3 were significantly associated to YPC and YI (Table 4), and this validated previous results obtained by using biparental mapping populations for PSY1 [15,16], LYCE [19,31] and AAO3 [32]. The association of PSY2, BCH1, CYP97A3, VDE and ABA2 genes with YI and YPC is novel, and indicated that the SNP markers identified within the carotenoid gene sequences can represent a resource for developing genetic markers for use in marker assisted breeding.
Ten carotenoid metabolic/catabolic genes were mapped in corresponding chromosome regions with QTLs detected in the current work and/or in previous QTL studies (see review in Additional file 1: Table S1 and Fig. 3) indicating possible relations between candidate genes and grain colour-related traits. Six genes (CRTISO, VDE, LYCE, PSY2, CYP97A3 and PSY1) are directly involved in the biosynthesis of carotenoid compounds [2]. Interestingly, the catabolic genes NCED9, ABA2 and AAO3, involved in the carotenoid cleavage to process violaxanthin and neoxanthin into abscisic acid, were located in chromosome regions influencing YPC [32][33][34]. These data are consistent with findings in other plant species such as Arabidopsis and maize [35,36], demonstrating that carotenoid degradation is important in determining total carotenoid accumulation.

QTLs detected by GWAS and comparison with previous studies
In addition to the candidate gene approach, we conducted a GWAS by using the GLM and the MLM models taking into account the confounding effect of population structure and the relative kinship. Q-Q plots clearly indicated the MLM (K + PCs) as the most suitable model for the GWAS of YPC and YI, thus confirming other results of GWAS on quantitative traits carried out on crop plants [37]. Several QTLs for YPC and YI, distributed on 12 of the 14 chromosomes of durum wheat, were detected (Tables 4 and 5 and Fig. 3). Four stable QTLs on 4B (two) and 7A (two) were associated with both YI and YPC, explaining the significant and positive correlation between the two colour-related traits found in the present and previous studies [38][39][40]. The higher number of QTLs for YI indicated that yellow pigments of wheat kernels are synthesized by different biochemical pathways, including that for the carotenoids, which interact in some way with the accumulation of carotenoids, such as polyphenol oxidase (PPO), lipoxygenase (LPX) and other carotenoid oxidative enzymes [10,11]. In addition, it is possible that the wider variability of the entire wheat collection is determined by more genes influencing colour-related traits, and that some yellow pigment genes have been fixed during the breeding programs for grain colour improvement and therefore not detected in the durum sub-population.
Several studies on QTL mapping of yellow pigments in wheat have been published during the past two decades. A detailed list of QTLs detected in 26 peer-reviewed papers is reported in Additional file 1: Table S1 and the majority of them are illustrated in Fig. 3. Except chromosome 1D, QTLs for yellow pigments were detected on all wheat chromosomes. Results of QTL mapping studies indicated many differences in the number and map position of QTLs detected in the different experiments. This may be attributed to a high number of effective genes underlying QTLs coupled with: a) different contributions from parental genotypes of mapping populations; b) QTL x environment interactions; c) differences in the carotenoid extraction procedures and colour measurement, therefore different gene-to-trait associations revealed; d) marker density of linkage maps used in QTL analyses; e) differences in the statistical procedures used for QTL detection and threshold used for the statistical significance of MTAs.
While many of the QTLs for YI and YPC identified in the current study had been described previously (see Fig. 3 for a detailed comparison), 11 QTLs detected on 1AS, 2AL, 2BS, 3BL (two), 4BS, 5AS, 5BS (two) and 7AS (two) were new. Four of these QTLs were detected in more than one environment (Table 5 and Table 6), indicating that some wheat accessions of the examined collection possess new stable alleles potentially useful for improving colour and nutritional value of wheat grain. Additionally 16 QTLs detected in the present study (on chromosome arms 1BL, 2AL (two), 3BS, 4AL, 4BS, 5AL, 5BL, 6AL (two), 7AL (five), 7BL (two)) validated QTLs previously detected in different genetic backgrounds. Therefore these QTLs can be considered as stable and useful for MAS in breeding programs.

Genotype x environment interaction and QTL detection
With the aim to investigate if the results of GWAS were affected by environmental fluctuations, we conducted replicated trials for YI and YPC in six and two environments, respectively. Comparing the GWAS results, large variations in the number and type of QTLs were observed for both traits in different environments, thus confirming the existence of genotype x environment interaction effects as indicated by the variance analysis. Stable associations for YI in at least three over six environments in the whole collection were detected for five QTLs corresponding to one genomic region on chromosomes 5B, and four regions on 7A. In many cases, the SNP-trait associations were (See figure on previous page.) Fig. 3 Schematic representation of wheat genome chromosomes. The map is a representation of A and B genome chromosomes of the durum consensus linkage map [29] and of D chromosomes of the consensus bread wheat map [28], with map positions of carotenoid candidate genes and QTLs for yellow index and yellow pigment content. Each chromosome map is represented by the first and the last SNP marker, and by a SNP marker every about 20 cM. SSR markers have been also inserted every about 20 cM to compare the consensus SNP map with published SSRbased maps. Markers are indicated on the right side and cM distances on the left side of the bar. Solid regions of the chromosome bars indicate regions identified as being significantly associated with YI and YPC in published QTL biparental mapping populations (black regions in at least two different populations, grey regions in one population). QTLs are represented by bars on the right of each chromosome bar. QTL names indicate the trait (YI for yellow index and YPC for yellow pigment content) and the population in which the QTL was detected (Col = whole collection and Dur = durum sub-population); the closest SNP marker is indicated in red. Carotenoid genes are indicated after the corresponding SNP located in the gene sequence (in blue) or in the same map position of the co-migrating SNP marker located in the same contig environment-specific, as 11 QTLs were consistent only in one environment and 7 in two environments. The same trend was observed for YPC evaluated in two environments. Although the high values of heritability (from 0.89 to 0.94 for YI and from 0.91 to 0.95 for YP) in open field trials, the complexity of the genetic basis of the studied traits tends to confound the interpretation of GWAS results. These findings are consistent with results obtained by association mapping and QTL linkage analyses experiments on complex traits with far lower heritability such as yield and yield components [41,42]. The present study suggests that QTL analysis for agronomically important "true" quantitative traits should be always conducted in a plurality of environments with different soil and climatic conditions. Finally, the need to evaluate and take into account the G x E interaction is important in breeding programs to identify genotypes adapted in a wide range of environments.

Comparison between simple regression and MLM analysis for QTL detection
The SNPs located in the gene sequences PSY1, PSY2, BCH1, CYP97A3, VDE and ABA2 were significantly associated to YI and YPC by regression analysis but not by GWAS analysis. Only the SNP marker IWB59875 located in the coding sequence of the abscisic aldehyde oxidase (AAO3) on chromosome arm 7AL was consistent by both MTA analyses. The PSY1, PSY2, CYP97A3 and VDE genes were mapped on chromosome regions corresponding to QTLs for YI and YPC detected in the current study by GWAS or by previous studies using biparental mapping populations (see Fig. 3). NCED, CRTISO and LYCE, which were excluded from the regression analysis as they had allele frequencies lower than 0.05, were also mapped in chromosome regions corresponding to QTLs for YI and YPC. The same results were obtained by Zhao [43], who detected several SNPs near height-controlling genes consistent only by the naïve approach, and suggested that mapping populations derived from crosses between genetically distant parents could be needed to complement GWAS to reduce the rate of both false positives and false negatives. It is well known that GWAS carried out by the GLM model generally gives a high number of falsepositives [44], and that it is necessary to take into account the confounding effect of population structure and relatedness among individual to control the overall probability of type I error [37]. However, reducing the number of false positives may lead to increasing the number of false negatives, and in some situation ignoring most of the important findings on the genetics and physiology of the traits of interest [45]. The combination of population genetic models and molecular biological knowledge into new QTL detection methods has been recently proposed to increase statistical power of GWAS in human and agricultural research, as to reduce the overall probability of type II error (false-negative associations), and incorporate biological context in GWAS results [46].

Conclusions
GWAS analysis in wheat collections can contribute to validate QTLs previously detected in biparental populations and to unravel new QTLs for colour-related traits. The MLM models can reduce the number of false positives, while the candidate gene approach can contribute to reduce the number of false negatives. However, GWAS analysis should be carried out on phenotypic data measured in more environments to detecting stable QTLs and determining the genotype x environment interactions that tend to confound the interpretation of MTAs and the genetic dissection even of quantitative traits with high heritability values. The availability of markers within the coding sequences of candidate genes can allow to elucidating the mechanism of carotenoid accumulation in the wheat kernels and to exploiting the genetic variability of colour-related traits for the nutritional and commercial improvement of end-finished products of wheat.