Identification of QTLs controlling grain protein concentration using a high-density SNP and SSR linkage map in barley (Hordeum vulgare L.)

Grain protein concentration (GPC) is a major determinant of quality in barley (Hordeum vulgare L.). Breeding barley cultivars with high GPC has practical value for feed and food properties. The aim of the present study was to identify quantitative trait loci (QTLs) for GPC that could be detected under multiple environments. A population of 190 recombinant inbred lines (RILs) deriving from a cross between Chinese landrace ZGMLEL with high GPC (> 20%) and Australian cultivar Schooner was used for linkage and QTL analyses. The genetic linkage map spanned 2353.48 cM in length with an average locus interval of 2.33 cM. GPC was evaluated under six environments for the RIL population and the two parental lines. In total, six environmentally stable QTLs for GPC were detected on chromosomes 2H (1), 4H (1), 6H (1), and 7H (3) and the increasing alleles were derived from ZGMLEL. Notably, the three QTLs on chromosome 7H (QGpc.ZiSc-7H.1, QGpc.ZiSc-7H.2, and QGpc.ZiSc-7H.3) that linked in coupling phase were firstly identified. Moreover, the genetic effects of stable QTLs on chromosomes 2H, 6H and 7H were validated using near isogenic lines (NILs). Collectively, the identified QTLs expanded our knowledge about the genetic basis of GPC in barley and could be selected to develop cultivars with high grain protein concentration.


Background
Protein is an essential nutrient for the survival of humans and animals [1,2]. Protein in mature cereal grains, in particular, provides a substantial portion of the world's plant protein, and its concentration determines the nutritional quality and end use properties of the grain [3,4]. Barley (Hordeum vulgare L.) is one of the earliest domesticated crops in the world. Approximately 25% of its production has relatively lower GPC and is suitable for malting and brewing, while the remaining 75% with relatively higher GPC is used for feed and food (http://faostat.fao.org/). Hence, there is increasing need for breeding barley cultivars with high GPC, but this has been hindered by the relatively low heritability of GPC due to the significant interaction between environmental and genetic factors [5,6]. Based on a statistical methodology, the genetic factors (quantitative trait loci, QTLs) that involved in determination of GPC can be elucidated [7]. Thus, identification and utilization of environmentally stable QTLs associated with GPC will provide an alternative but promising strategy for high GPC barley breeding.
To date, numerous studies have been conducted on dissecting the genetic basis of GPC, and QTLs have been mapped on all seven barley chromosomes. In particular, several consensus QTLs mapped on chromosomes 2H, 4H, 5H, 6H, and 7H have been repeatedly detected by multiple studies [8][9][10][11][12][13][14][15]. For example, two QTLs on chromosomes 5HS and 6HS located in the Bmac0096-Bmag0323 and ABG458-HVM74 intervals, respectively, have been repeatedly detected [10][11][12]16]. Moreover, these two loci have also been identified by genome-wide association studies (GWAS) [17][18][19]. In addition, two genes (HvNAM1 and HvNAM2) on chromosomes 6H and 2H in barley, which were suggested to be orthologous to TtNAM-B1, contributed a substantial effect on GPC [17,20]. Notably, a recent study revealed that a single nucleotide polymorphism (SNP) within the second intron of HvNAM2 was associated with GPC, which is useful in developing high quality barley cultivars [17]. Although these identified QTLs/genes for GPC that could be expressed under multiple environments might be valuable for GPC improvement in barley, most of the genetics studies focused on breeding and selection for low-protein barley [21,22].
A saturated genetic linkage map will improve the precision of QTL localization and estimation of phenotypic variance, especially for some small and medium-sized QTLs [23]. Due to the abundance of SNPs in plant genome, SNP markers have been widely used in genetic linkage map construction [24][25][26]. High-density SNP linkage maps have been largely used in QTL detection for yield and quality in barley [27][28][29]. However, QTL mapping for GPC based on a high-density SNP map has rarely been reported. Here, to identify QTLs for GPC, a RIL population including 190 lines derived from a cross between the Chinese landrace ZGMLEL with high GPC (> 20%) and the Australian cultivar Schooner was used for linkage and QTL analyses. Furthermore, near-isogenic line (NIL) populations were developed to validate the environmentally stable QTLs.

Plant materials
A RIL population (generations F 9 to F 11 ) containing 190 RILs derived from two spring barley varieties, ZGMLEL and Schooner, was employed to identify QTLs controlling for GPC. ZGMLEL is a hull-less landrace with high GPC, while Schooner is a hulled cultivar with low GPC. All the RILs and their parental lines were kindly provided by Dr. Yawen Zeng (Yunnan Academy of Agricultural Sciences, China).

Field experiments
Field experiments were carried out in three locations, including Shangzhuang Experiment Station of CAU (China Agricultural University) in Beijing, Wangtaibao Experiment Station of NAAFS (Ningxia Academy of Agriculture and Forestry Sciences) in Ningxia Hui Autonomous Region, and Dishang Experiment Station of HAAFS (Hebei Academy of Agriculture and Forestry Science) in Hebei Province. The RIL population and the two parents were grown during three growing seasons from 2013 to 2015, providing data for six environments.
Location-year information and corresponding weather data are presented in Additional file 1: Table S1. In field trials, each plot consisted of 2 rows that were 2 m long with approximately 20 plants per row. The middle ten plants in each line were bulk-harvested at maturity and measured for grain protein concentration (GPC).
Three BC 3 F 2 populations for QTL validation were planted in Beijing (2016). Individuals were grown in 2m-long rows with a 0.25-m row spacing. Within each row, 15 plants were evenly sown. At maturity, all the panicles were harvested from single-plant and sun-dried. Grain protein concentration (GPC), grain yield (GY) and thousand grain weight (TKW) were scored on a singleplant basis.
The field experiments were in accordance with local practice. All the trails were conducted under optimum irrigation. Nitrogen (N) was supplied at a rate of 220 kg/ ha, including 70 kg/ha of N as diammonium phosphate and 80 kg/ha of N as urea applied before sowing. In addition, 70 kg/ha of N as urea was applied at booting stage.

Phenotypic evaluation and statistical analysis
Mature grains of RIL population and BC 3 F 2 populations were ground to a powder using a Cyclotec 1093 sample mill (Hoganas City, Sweden). Then, the ground powder was dried to a constant mass in an 80°C oven. The total nitrogen content was determined using the Kjeldahl method with a FOSS Kjeltec ™ 2300 and then the GPC was calculated using a factor of 5.83 [30]. GY and TKW of the NIL populations were measured on a single-plant basis. TKW was determined using a camera-assisted phenotyping system, which was provided by Hangzhou Wanshen Detection Technology Co., Ltd. (Hangzhou, China).
The basic statistical analysis was performed using SPSS version 20.0 (SPSS, Chicago, IL, USA). The Shapiro-Wilk test was conducted using R software (V. 3.2.2) for the normality test. The best linear unbiased prediction (BLUP) for GPC across the six environments was calculated using SAS ® V.8 (SAS Institute Inc. 2000) with the PROC MIXED procedure. Under the random-effect model, environments were treated as fixed, and genotype and genotype-environments interactions were considered as random factors. The broad sense heritability (h B 2 ) on a family basis was calculated using SAS ® V.8 (SAS Institute Inc. 2000) with the PROC GLM procedure, which was calculated according to the following formula: h B 2 = ó g 2 / (ó g 2 + ó ge 2 /n + ó 2 /nr) where ó g 2 = genotypic variance, ó ge 2 = genotype by environmental variance, ó 2 = the residual error variance, n = the number of environments, and r = number of replicates.

Linkage and QTL analyses
The RIL population was genotyped using the barley 9 K SNP chip developed from the RNA-seq data of barley varieties [31]. Additionally, a total of 21 polymorphic SSR markers were employed to genotype the RIL population, and most of the SSR sequences were obtained from http://wheat.pw.usda.gov/GG3/. Only markers with less than 5% missing data were selected for map construction. The genetic linkage map was constructed using RECORD 2.0 [32] and JoinMap 4.0 [33]. Markers with identical segregation were first removed using REC-ORD 2.0. After removing the redundancy, the unique markers were grouped using JoinMap 4.0 with a LOD value of 10. Finally, the marker order was established using the maximum likelihood mapping algorithm and the map distance was calculated using the Kosambi mapping function. The probe sequences of the SNP assigned to barley chromosomes were queried using the BLAST algorithm against barley reference genome sequence to locate chromosomal positions with a cutoff criterion of E-value ≤1e-10. The quality of the genetic map was validated using the alignments between SNP map and barley reference genome. Only the best hit of the SNP against the reference genome was selected for the collinearity analysis when the SNP was located to multiple paralogous positions in the genome.
The average GPC data in each environment and BLUP values across six environments were collected for QTL analysis. WinQTLCart2.5 software with the composite interval mapping (CIM) method was used to identify QTLs for GPC. The walking speed was set to 1 cM. Model 6 was chosen for QTL analysis, with 5 control markers and 10 cM window size defaults. The LOD threshold was set via 1000 permutations at P ≤ 0.05, and these QTLs were considered "identified QTLs". A 2-LOD support with a 99% confidence level was chosen for each identified QTL. The identified QTLs detected in different environments with overlapping confidence intervals were regarded as the same in this study. The QTLs were named following the rules of Blake and Blake [34].

Marker development
The flanking markers of the QTLs were employed to define the target region, which could be used to compare to the barley Genome Zipper developed by Mayer et al. [35]. Gene sequences of three grasses (rice, sorghum, and Brachypodium) were used as queries to blast against the database "assembly_WGSMorex" at IPK Barley BLAST Server (http://webblast.ipk-gatersleben.de/bar-ley_ibsc/). The Morex contigs with best hit were employed to search for simple sequence repeat using the SSR Hunter software. Finally, the selected sequences were used to design SSR markers using the Primer3 software (http://frodo.wi.mit.edu/primer3/).

Analysis of GPC
The basic statistics of minimum, maximum, mean standard deviation and coefficient of variation for GPC in six environments are listed in Table 1. The GPC of parent ZGMLEL ranged from 20.52% to 22.88% in the  Table S2). Moreover, the 190 RILs exhibited a wide range of variation in GPC, with coefficients of variation (CVs) ranging from 6.94% to 7.80% in the six environments. The Shapiro-Wilk for testing normality was performed for GPC based on the mean value collected from six environments (Fig. 1). In all of the six environments, GPC showed normal distribution, suggesting a quantitative nature of GPC in barley. Remarkably, the broad sense heritability (h B 2 ) for the GPC of the RILs was 80.67%, indicating that the GPC variance was mostly determined by genetic factors.

Construction of a high-density genetic linkage map
Of the 7864 SNP markers on the chip, 1526 (19.40%) were polymorphic between ZGMLEL and Schooner. After removing 53 SNPs with over 5% missing data, we used 1473 SNP markers and 21 polymorphic SSR markers to construct the linkage map. The resultant linkage map comprised nine linkage groups that contained 1011 unique loci and spanned 2353.48 cM. These linkage maps had an average locus interval of 2.33 cM (Table 2, Additional file 3: Table S3, Additional file 4: Figure S1). The identity and polarity of linkage groups were determined by BLAST against the barley reference sequence databases [36].
To further validate the quality of the map, SNP flanking sequences were employed to align with the barley reference sequence. Of 1473 SNP markers, 1411 (95.79%) were successfully assigned to the barley genome (Table 2; Additional file 5: Table S4). A good collinearity of the genetic map with the barley reference genome sequence was observed (Fig. 2), indicating a high quality of the genetic linkage map. However, several chromosome intervals were inconsistent with the reference genome sequence, i.e., chromosomes 2H at 76. 43-204.46    Four QTLs were found to be significantly associated with GPC on chromosome 7HS. Among these significant QTLs, three were environmentally stable QTLs, and they were designated QGpc.ZgSc-7H.1, QGpc.ZgSc-7H.2, and QGpc.ZgSc-7H.3. These three stable QTLs were detected in three to five environments, explaining 12.37-13.29% of GPC variation for the combined analysis. The last putative QTL, QGpc.ZgSc-7H.4, was detected at E6 and accounted for 7.48% of GPC variation.

QTL validation
To study the three genomic regions on chromosomes 2H, 6H and 7H that possessing environmentally stable QTLs for GPC in more depth, three BC 3 F 2 populations were developed and named BC 3 F 2 -I, BC 3 F 2 -II and BC 3 F 2 -III, respectively. Accordingly, three sets of SSR markers were used for foreground selection, i.e., 2L10, 2L11 and 2L12 for BC 3 F 2 -I, 6L89, 6L155 and 6L147 for BC 3 F 2 -II, and 7S40, 7S69, 7S87 and 7S89 for BC 3 F 2 -III (Additional file 6: Figure S2, Additional file 7: Table S5). A total of 76 SSRs were used for background selection of the BC 3 F 1 individuals. Finally, three BC 3 F 1 individuals that exhibit heterozygosity solely at genomic regions 2H, 6H or 7H were selfed to produce their corresponding BC 3 F 2 populations. These three BC 3 F 1 individuals shared 93.42, 92.10 and 94.74% similarities in genetic background with the recurrent parent, respectively.
To determine whether the stable QTLs affect the GPC in NIL populations, we compared the GPC between two homozygous groups. Based on the genotype of flanking markers (2L10 and 2L11 on 2H, 6L89 and 6L147 on 6H, 7S87 and 7S40 on 7H), two homozygous groups, namely, ZGMLEL homozygous (ZZ) and Schooner homozygous (SS) were classified in each NIL population. The evaluation results for GPC showed that plants with ZZ genotype in BC 3 F 2 -I, BC 3 F 2 -II and BC 3 F 2 -III had an average GPC of 13.82, 14.18 and 14.20%, respectively. In contrast, plants with SS genotype in BC 3 F 2 -I, BC 3 F 2 -II and BC 3 F 2 -III had an average GPC of 13.15, 13.19 and 13.48%, respectively, which is similar to the recurrent parent, Schooner (13.32%) (Additional file 8: Table S6). Based on the GPC value, highly significant difference was found between two homozygous genotypes in each NIL population (P < 0.01) ( Table 4, Additional file 8: Table S6). The allelic effects of the three populations were in the same direction as the original allele, with alleles from ZGMLEL increasing GPC. These results suggested that the stable QTLs on chromosomes 2H, 6H and 7H had significant effect on GPC, which was in agreement with the detection in RIL population. Previous studies reported that there was negative relationship between GPC and grain yield [37]. Thus, we measured thousand kernel weight (TKW) and grain yield per plant (GY) for the three BC 3 F 2 populations but found no significant difference for TKW and GY (Table 4; Additional files 8: Table S6).

Discussion
The advantages and disadvantages of the present genetic linkage map QTL mapping is a reliable way to resolve the genetic basis of GPC, and a high-density map will increase the accuracy of QTL detection [23]. In the present study, a high-density map comprised of 1473 SNP and 21 SSR, and spanned 2354.48 cM in length. Notably, our genetic map has a good collinearity with the barley reference genome, which is suitable for the identification of QTLs [38]. However, several chromosome intervals were inconsistent with the reference genome sequence. This could be partially explained by the following reasons: 1) a suppressed recombination frequency at the centromere region, 2) the presence of partially homologous sequences or duplication, and 3) the deficiency of polymorphic markers. In addition, four chromosomes (1H, 3H, 4H and 7H) had high genome coverage (95.40-99.93%), while three (2H, 5H and 6H) showed low genome coverage (63.13-65.51%), which might be caused by the lack of polymorphic markers within the centromeric region. Due to the low recombination frequency in the centromeric region, we speculated that it would not influence the identification of the QTLs.
Compared with two SNP maps reported by Close et al. [24] and Muñoz-Amatriaín et al. [25], the whole genome of our map expanded in genetic distance by 41.75 and 111.51%, respectively, with individual chromosome extended by 26.44% to 83.89% and 24.30% to 160.97%, respectively (Additional file 9: Table S7). Missing genotype data of each line and large number of heterozygotes could lead to expanded genetic distance [39,40]. Consistent with this, similar phenomenon was observed in our SNP genotype data, which could partially contribute to the large whole genetic distance. The casual reason will be an interestingly area to further investigation.

Extensive variation for GPC in barley
Determining the phenotypic variation of GPC in a segregating population is a prerequisite for elucidating its genetic foundation and for breeding barley cultivars with desirable GPC. Extensive variation in GPC in different barley genotypes has been reported previously. For example, analysis of 59 cultivated and 99 Tibetan wild barley accessions showed that the GPC ranged from 8.02% to 13.50% and Tibetan wild barley had much higher GPC than cultivated barley [17]. QTL analysis provides an efficient way to look for associations between the phenotypic variance and the markers segregating in a biparental population [41,42] and has been widely used in dissecting GPC variation in barley populations. However, the lack of parental lines with high GPC in most previous studies may have hindered the detection of possible major QTLs for GPC [10,43]. In the current study, the rare accession ZGMLEL, with consistently high GPC (20.52-22.88%), and an Australian cultivar, Schooner, with relatively low GPC (16.35-17.20%), were used to construct the mapping population. A relatively high broad sense heritability (80.67%) was found, suggesting that QTLs/genes controlling GPC are less environmentally influenced in the ZGMLEL × Schooner population. Thus, the ZGMLEL × Schooner population is a perfect material for identifying QTLs for GPC. Environmentally stable QTLs detected in this way might be suitable for marker-assisted selection (MAS) in barley breeding, which is anticipated to increase efficiency of the genetic improvement for GPC.
A significant QTL QGpc.ZgSc-4H.1 for GPC was identified in the telomeric region of chromosome 4HS and was steadily expressed in three environments. QTLs affecting GPC have been identified on 4HS [8,43,44] and 4HL [15,45,46]. For example, Marquez-Cedillo et al. identified a QTL for GPC at the region of the intermedium-c (int-c) locus, which is obviously different from QGpc.ZgSc-4H.1 (Fig. 3) [8]. Therefore, QGpc.ZgSc-4H.1 likely represents a new locus for GPC, although its contribution to the variation of GPC was relatively small.   [8,[12][13][14]. For example, Emebiri et al. [12] reported mapping of two QTLs for GPC in the telomeric and centromeric regions that are probably within the physical intervals of 15.8-40.0 and 261.8-277.6 Mb, respectively (Fig. 3) [12]. Marquez-Cedillo et al. [8] and Abdel-Haleem et al. [14] identified a consensus QTL near the nud locus on chromosome 7HL [8,14]. However, the location of these QTLs was different from that detected in the present study. Therefore, these three QTLs in adjacent intervals are likely to be new QTLs, which might be due to the utilization of specific genetic materials in the present study.

Consensus QTL regions for GPC on chromosomes 2H and 6H
An efficient method to introgress favorable alleles into elite germplasm is to select consensus QTLs that steadily affect GPC in different genetic backgrounds and environments [47]. For example, Emebiri reported that pyramiding two consensus QTLs on chromosomes 6HS and 5HS could significantly decrease GPC levels by 4% compared to the commercial check [22]. In this study, two genomic regions on chromosomes 2HL and 6HL for GPC were coincident with QTLs reported in previous studies ( Fig. 3; Additional file 10: Table S8). For example, a stable QTL QGpc.ZgSc-2H.1 on chromosome 2HL was coincident with the locus reported by Marquez-Cedillo et al. [8]. Another major QTL, QGpc.ZgSc-6H.3, explaining the highest GPC variance was mapped at a similar locus to Qpro6a detected in the Morex/Steptoe DH population [15]. However, the additive effect of our loci (0.43-0.59%), however, is higher than Qpro6a (0.14%), which might be caused by the special materials used in our study.
To date, two homologous genes, HvNAM1 and HvNAM2 associating with GPC on the short arm of chromosomes 6H and 2H, respectively, have been widely studied [17,48,49]. For example, Cai et al. [17] performed a multi-platform candidate gene-based association analysis using 59 cultivated and 99 Tibetan wild barley genotypes and found that the haplotypes of HvNAM1 and HvNAM2 markers were associated with GPC in barley. In the present study, two identified QTLs, QGpc.ZgSc-6H.3 and QGpc.ZgSc-2H.1 associated with GPC were also detected on chromosomes 6H and 2H, respectively, while they were both located on the long arms, demonstrating that HvNAM1 and HvNAM2 were obviously different from the QTLs detected in this study.

QTLs for GPC linked in coupling phase on chromosomes 6H and 7H
Neighboring QTLs associated with many important traits, such as yield and quality, that are linked in coupling phase are commonly observed in primary QTL analysis [50,51]. Previous studies have tried to dissect QTLs in coupling phase using nearly isogenic lines (NILs) or residual heterozygous lines (RHLs), and found that coupling QTLs were partially attributed to tightly linked independent QTLs [52][53][54][55]. For example, Han et al. identified two QTLs each for malt extract and for αamylase and two to three for diastatic power in a complex QTL region using advanced segregation populations [53]. In this study, we detected two genomic regions on chromosomes 6HL and 7HS, each of which harbored linked QTLs for GPC. Region 6H contains three neighboring QTLs, i.e., one environmentally stable QTL (QGpc.ZgSc-6H.3) and two putative QTLs (QGpc.ZgSc-6H.2 and QGpc.ZgSc-6H.4). These three QTLs with favorable alleles from one parent (ZGMLEL) were in coupling phase. A shadow QTL, significant but false, is caused by a real QTL in an adjacent marker interval [55]. Since QGpc.ZgSc-6H.2 and QGpc.ZgSc-6H.4 were located close to the stable QTL QGpc.ZgSc-6H.3, it is difficult to determine whether these two loci were shadow or genuine QTLs. Similarly, region 7H also contains three linked QTLs that were in coupling phase. Unlike the region 6H, region 7H harbored three environmentally stable QTLs and showed similar effects on GPC. Further studies are needed to dissect these two complex regions using advanced population.
The contribution of stable QTLs to GPC QTL effect was generally not precisely estimated in primary QTL analysis due to the genetic noise in mapping populations [56][57][58][59][60][61][62]. In view of this point, NILs were proposed and developed as an ideal population for QTL