Marker-trait association analysis of frost tolerance of 672 worldwide pea (Pisum sativum L.) collections

Frost stress is one of the major abiotic stresses causing seedling death and yield reduction in winter pea. To improve the frost tolerance of pea, field evaluation of frost tolerance was conducted on 672 diverse pea accessions at three locations in Northern China in three growing seasons from 2013 to 2016 and marker-trait association analysis of frost tolerance were performed with 267 informative SSR markers in this study. Sixteen accessions were identified as the most winter-hardy for their ability to survive in all nine field experiments with a mean survival rate of 0.57, ranging from 0.41 to 0.75. Population structure analysis revealed a structured population of two sub-populations plus some admixtures in the 672 accessions. Association analysis detected seven markers that repeatedly had associations with frost tolerance in at least two different environments with two different statistical models. One of the markers is the functional marker EST1109 on LG VI which was predicted to co-localize with a gene involved in the metabolism of glycoproteins in response to chilling stress and may provide a novel mechanism of frost tolerance in pea. These winter-hardy germplasms and frost tolerance associated markers will play a vital role in marker-assisted breeding for winter-hardy pea cultivar.

Since Mendel's laws of genetic heritance, pea (Pisum sativum L.) has been one model plant for genetic study 1 . In addition, pea is one of the most economically important legume crops. It can be used both as a grain and a vegetable and provides a high quality source of protein for humans and livestock 2,3 . Moreover, pea is also considered an environmentally friendly crop due to its nitrogen fixation capacity which can reduce the need for nitrogen fertilizer and greenhouse gas emissions 4,5 . Pea is classified by end use into dry pea (also known as field pea) and green pea. In 2014, pea ranked the third (following common bean and chickpeas) in dry grain and the second (next to common bean) in green pea in the global production among legume crops, respectively 6 . With the growing market demand, China became the largest green pea and the second largest dry pea producer in the world 6 .
Freezing temperatures influence the growth and reproduction of plants and limit the distribution range of them 7,8 . During the long-term process of evolution, plants evolved an adaptation to withstand cold which is known as "cold acclimation" induced by low, non-freezing temperatures 9,10 . However, frost damage will occur resulting in irreversible injury such as destruction of cell membrane system or loss of photosynthetically active tissue when plants are not acclimated 11,12 . For legumes species, frost stress is also one of the principal limiting abiotic factors affecting their production by damaging seedlings and limiting plant growth 13,14 . Therefore, striking out in depth research on plant frost tolerance is significant not only in exploring the molecular basis of cold acclimation, but also to practical production of most crops 11 .
Field pea consists of three phenological types: spring, Mediterranean and winter 15 . Currently, autumn sowing for winter types is widely accepted as a way to improve yield potential and stability in pea due to advantages of a higher biomass production through a longer life cycle as well as the avoidance of the drought and the heat stress of late spring [14][15][16] . Although both spring and winter types of field pea are grown in China, most production of pea is in the semi-tropical climate zone as an upland winter crop 17 . However, severe frost damage in winter or early spring that occurs at the seedling stage may cause high rate of seedling death and much production decrease 18,19 .
Thus, studies on frost tolerance and breeding for winter-hardy cultivars play fundamental roles in the stable increase of pea production [20][21][22][23] .
To overcome the frost damage to pea production in winter, some literature on winter hardiness in pea have been struck in previous studies. Field pea was found to exhibit a moderate freezing tolerance with a LT 50 (temperature that kills 50% of seedlings) of −4.5 °C comparing with some forage legumes 18 , while some winter hardy cultivars of pea were found to be able to adapt to a temperature range between −8 to −12 °C 20, 23 . Moreover, breeding efforts were given to the development of some winter hardy pea cultivars that have already been utilized in the temperate regions of Europe and the USA 21, 24-26 . Recently, winter hardiness was evaluated for a limited 58 accessions of pea germplasm under both field and laboratory conditions in Turkey, and identified genotypes with differential survival at −8 °C among which the higher level of winter hardiness were selected for future cultivar development 20 . Meanwhile, a large-scale evaluation of 3,672 pea germplasm for cold tolerance was conducted in the field in China and found that genotypes from winter production regions showed a higher level of cold tolerance than those from spring production regions and identified a collection of genetic resources for cold-tolerance of pea in China 22 . However, due to the significant effect of environmental heterogeneity in field screening for frost tolerance 15 , multi-location and multi-year field trails are necessary to test the cold tolerance adaptation to a more wide range of these genetic resources in China.
Association mapping, utilizing a broader gene pool including natural population, germplasm collection and breeding cultivars, has emerged as an alternative approach to traditional QTL mapping and provides a powerful tool for identification of novel functional variation and quantitative traits in plants 27,28 . There are several advantages of association mapping over traditional QTL mapping: (1) utilizing a broader genetic variations for marker-trait correlations, (2) obtaining a higher resolution mapping due to more recombination events in germplasm development history, (3) availability of exploiting previous phenotyping data, and (4) no need to develop costly, labor-intensive and time-consuming biparental populations 28,29 . Association mapping is generally applied to identify markers useful in marker-assisted selection (MAS) for superior individuals in crop breeding or in cloning functional genes governing important agronomic traits, and has been successfully performed in many crops including maize 30 , soybean 31 , Aegilops tauschii 32 and also in pea emphasizing on seed nutrients, agronomic and quality traits with germplasm collections [33][34][35] . However, the marker-trait association of frost tolerance in pea has not been reported yet.
In this study, field screening for winter-hardy germplasms and marker-trait association analysis of frost tolerance in pea were conducted to 672 diverse pea accessions including germplasm collections from China, USA and other countries and a set of previously identified cold tolerant germplasms 22 . Field evaluation data were collected across three locations in Northern China and three growing seasons from 2013 to 2016 and genotype data were based on 267 polymorphic SSR markers. The objectives of this study were as follows: (1) to screen for winter-hardy germplasms in pea, (2) to assess the pattern of genetic variation and population structure, and (3) to identify frost tolerance associated SSR markers to use in MAS pea breeding programs.

Results
Phenotype variability. Frost tolerance was evaluated as survival rate (SR) for 672 diverse pea germplasms in three locations and three growing seasons under field condition (Table 1 and Figure 1). It was observed that the SR values ranged from 0.03 (BZ_2015 and YT_2015) to 0.73 (QD_2013) ( Table 1). There were obvious variations among the three locations as well as among growing seasons. Among the three locations, QD had the highest SR

Trait
Year Location SR F value of effects  Figure 1A). On the other hand, the fluctuation curves of the lowest temperature (T min ) at different locations and in different growing seasons exhibited similar trends of the SR values. Namely, the T min at QD is the highest, YT, the second and BZ, the lowest. For different growing seasons, the winter in 2015 had a much lower T min and SR values at all three locations than the other two growing seasons ( Figure 1B, C, and D). It is obvious that the winter survival of pea is closely related to the temperature changes.
To detect the effect of variance components for SR, ANOVA were performed using a mixed-effect model. Results show that all the effects significantly affected SR at p = 0.001 level in all datasets except that genotypes affect SR in BZ_2013 at p = 0.01 level (Table 1). In combination of year and location, location exhibited the maximum effect on the variation of SR, followed by year, and then genotype. This observation demonstrates the necessity of multi-location and multi-year experiments for the evaluation of frost tolerance in pea.
In addition, pairwise correlation analysis of SR was conducted for different field experiments. The correlation coefficients range from 0.09 (BZ_2014&QD_2013, BZ_2015&YT_2013 and YT_2015&QD_2013) to 0.62 (YT_2014&QD_2014) with an average of 0.34. Although all the experiments were significantly correlated with each other, the correlation coefficients between QD_2013 and other datasets are relatively low. Besides, similar trends of distributions for SR were observed in different experiments except for QD_2013 with a much higher SR than the rest ( Figure 2).
A total of 16 accessions survived in all nine field experiments (three locations and three growing seasons and the lowest temperature record was −18 °C at BZ in 2015-16 growing season) ( Table 2). These accessions were considered as the most winter-hardy germplasms, of which ten were from China, six from European countries. The mean SR for these germplasms was 0.57 with a range from 0.41 to 0.75. It is believed that these winter-hardy germplasms will be valuable genetic resources for winter-hardy pea cultivar breeding.
Genetic variability. We genotyped 672 diverse pea accessions with 267 polymorphic SSR markers and detected a significant level of polymorphism. The number of alleles revealed by each primer pair varied from two to ten with an average of four. The average of gene diversity index and polymorphic information content (PIC) were 0.53 (ranging from 0.06 to 0.86) and 0.46 (ranging from 0.06 to 0.84), respectively.
Population structure analysis was performed to avoid false association between genotype and phenotype due to the difference of allele frequency within subpopulations. Based on structure analysis, the Ln P(D) value of different K increased from 1 to 10, with delta K showing a peak at K = 2 (Fig. S1, Supporting information), which implied two sub-populations with some admixture (with q values of less than 0.7) in the collections ( Figure 3A). The three subpopulations group I, group II and admixture represent 26%, 59% and 15% of the total entries, respectively (Table S1, Supplementary Information). We then investigated the geographic distribution of the genetic groups and found that group I was mainly from China (81%) and group II mainly distributed in countries outside of China (85%). It can be inferred that the genetic background of Chinese pea germplasms was significantly different from those of foreign countries.
The neighbor-joining (NJ) cluster analysis showed a similar results to structure analysis. Two clusters were identified and one is mainly containing group I and the other mainly containing group II ( Figure 3B).
The kinship analysis revealed that 96% of the genotypes were detected to have little or no relationship (the kinship coefficient <0.2) suggesting that the majority of the samples are genetically diverse ( Figure 3C). The relative kinship coefficient was used as K matrix in further association analysis.   Marker-trait association analysis of frost tolerance. Association analysis was conducted using the genotype data and nine phenotype data-sets obtained in different field trails with two models including GLM + Q and MLM + Q + K. Under a strict threshold of 2 × 10 −4 at a significance level of 0.05 after Bonferroni correction, a total of 58 and 24 markers were detected to have association with the frost tolerance in GLM + Q and MLM + Q + K (Figure 4), respectively, and 22 markers were present in both models (Tables S2 and S3, Supplementary Information). Comparing to GLM + Q, the significant trait-marker associations detected in MLM + Q + K decreased by nearly 60%. In addition, there are 21 and eight markers were repeatedly detected in at least two different environments in both models, respectively, and seven markers were common in both models (Table S2, Supplementary Information). These seven markers were revealed to have the most reliable associations with frost tolerance in pea (Table 3). Among the seven markers, the lowest mean of P-value and the largest r 2 were observed in the marker 24142, while the marker EST723 has the maximum associations in different environments (three repetitions in GLM + Q and four repetitions in MLM + Q + K) with the second-lowest mean of P-value.

Discussion
Frost stress is one of the major abiotic factors affecting legumes 13,14 and most production of pea in China are winter types 17 . One of the breeding objectives is incorporating frost tolerance into elite cultivars, which will help stabilize yield and expand the production areas northward to satisfy increasing demands of pea in China. This study evaluated to 672 diverse pea accessions including germplasm collections in China and from abroad plus a set of previously reported cold tolerant germplasm 22 at three locations and in three growing seasons between 2013 and 2016. Our results showed that the winter survival was varied greatly among locations and growing seasons. The survival rate ranged from 0.03 to 0.73 with an average of 0.21 (Table 1). The lowest and highest winter survival were observed in Binzhou (the northernmost location with the lowest temperature) and Qingdao (the southernmost location with the highest temperature) of the three locations, while they were also observed in 2015 as the coldest winter and in 2013 with the warmest winter of the three growing seasons, respectively ( Figure 1). It is suggested that the winter survival of pea is closely related to the temperature change of environment under field conditions. Winter survival is a complex trait affected by various abiotic stresses such as freezing temperature, soil moisture, snow cover and so on 23,36,37 . Previous studies in several crop species under controlled experiments revealed that frost tolerance is a principal cause of winter hardiness 38,39 . Our results are consistent with the above conclusion and demonstrated the relevance between winter hardiness and frost tolerance under field conditions. A considerable problem in field screening for frost tolerance is the variable environmental factors and the frost stress may not occur in the test year or may be so severe that all plants in test would be killed 15 . In an early study of field screening for winter hardiness peas in two locations and two growing seasons in the United States, 14 lines of P. sativum and four lines of P. arvense were screened as winter-hardy genotypes under field at Moscow, Idaho, while only winter and spring pea lines were separated with the protection of snow cover at Bozeman, Montata 23 . In the present study, three different latitudes of locations for field trails were set in the traditional spring-sown area in China with chilly winter to screen the winter hardy germplasms in three consecutive growing seasons. Significant correlations and distribution trends for winter survival were observed in all the experiments except QD_2013 (Figure 2), which may be due to the warm winter at Qingdao in 2013 ( Figure 1B). In addition, 16 accessions were identified as the most winter-hardy germplasms for their ability to survive in all nine field experiments. The overall mean of SR for these 16 accessions was 0.53 and the individual accession mean ranged from 0.41 to 0.75. It is worth nothing that one of the winter-hardy accessions PI 269818 from England was previously reported as winter-hardy in winter hardy evaluation study in the USA more than 30 years ago 23 . It has been proved that multi-location and multi-year experiments can be mutual verification and provide more robust phenotyping data for further studies 16,20,34 . Therefore, these winter-hardy germplasms screened in this study can be used in future MAS breeding programs.
Association analysis has been performed in pea emphasizing on seed nutrients, agronomic and quality traits with different markers [33][34][35] . This study is the first genome-wide association analysis focused on frost tolerance in   Table 3. Summary of significant marker-trait associations for frost tolerance repeatedly detected in different experiments with two models.
pea. We used 672 diverse pea accessions, and evaluated them in multi-location and multi-year field experiments; and generated genotyping data with 267 polymorphic SSR markers. In the association analysis, spurious associations arising from population structure should be avoid 40 . Population structure analysis detected two genetic groups with some admixtures and distinguished pea germplasms of China from those of other countries, which is similar with the result of NJ cluster analysis (Figure 3). Genetic distinction was also observed between the Chinese P. sativum germplasm and the global gene pool sourced outside China in a previous study to compare the genetic diversity of Pisum collections with a wider genetic background 41 . However, except for population structure, familial relatedness can also cause false positives in association analysis 42 . Kinship analysis revealed that most of the germplasms (96%) used in the present study showed little or no relationship between each other ( Figure 3C). In present study, both GLM + Q model utilized only population structure and MLM + Q + K model incorporating population structure and relative kinship were applied in association analysis. Fifty-eight and 24 markers were identified to be significantly associated with frost tolerance in pea in GLM + Q and MLM + Q + K, respectively (Tables S2 and S3, Supplementary Information). The scope of candidate markers were narrowed down using MLM + Q + K model ( Figure 4) and such reduction is consistent with other association studies 35,42 . It is suggested that the inclusion of population structure and relative kinship in the present study had taken a full account of the possible biased effects and make the association analysis more accurately 42 . Associated markers repeatedly detected in two or more different environments are considered more reliable than those only present in a single environment 34 . Seven such markers were identified in both models in present study (Table 3). Based on the genetic linkage map constructed in our previous study 43 , four of these markers are located in three linkage groups, LGI, VI and VII (two markers) ( Table 3). Previous QTL mapping studies have identified several QTLs related to frost tolerance in pea on different genomic regions including LGI, III, V, VI and VII and some of them are related to various traits involved in frost tolerance 16,44,45 . A major QTL associated with freezing stress on LGIII co-localizes with the flowering gene Hr, which can delay flowering until long days to avoid the risk of frost damage 45 . QTLs on LGVI related to raffinose metabolism and RuBisCO activity showed a potential role in cold acclimation of pea through maintaining photosynthesis and accumulating protective sugars 44 . A QTL related to freezing damage on LGVI of P. sativum is found to be corresponding to a major QTL for freezing tolerance on chromosome 6 in Medicago truncatula 46 . Previous studies also showed that frost tolerance is related to glycine degradation pathway, jasmonate pathway, photosynthesis, production of cryoprotectant molecules, proteins related to soluble sugar synthesis and antioxidant processes through the chloroplast, structural alteration in cell wall pectins and so on [47][48][49] . In present study, the marker EST1109 on LGVI significantly associated with frost tolerance is located within a functional gene that has a high homology with a gene encoding mannosyl-oligosaccharide 1, 2-alpha-mannosidase in M. truncatula 50 . The locus affecting frost tolerance is discovered in this study and how this gene works awaits further investigation. It has been reported that alpha-mannosidases were involved in N-glycan processing and play a role in the development of root and the biosynthesis of cell wall in Arabidopsis 51 . Recently, mannosidases were found to be associated with N-glycan degradation of glycoproteins in response to chilling stress in Arabidopsis 52 . Considering that the mannosidases are involved in endoplasmic reticulum associated degradation of misfolded glycoproteins 53,54 , it is speculated that mannosidases are involved in a general plant chilling stress response pathway 52 . In conclusion, the results of the present study suggested a novel mechanism involved in frost tolerance in pea and provided helpful references on the general chilling stress response pathway in plant. The distributions of SR and pairwise correlations of nine different field trails were calculated using R software (R Development Core Team 2015). In addition, analysis of variance (ANOVA) for SR were analyzed using a mixed-effect model method in SAS ® 9.3 (SAS Institute Inc., Cary, North Carolina, USA) to detect the effect of genotypes, genotypes × years, genotypes × locations, years × locations and genotypes × years × locations, (Table 1). In the mix-effect model, SR was considered as dependent variables. Genotypes, locations and years were chosen as fixed effects, while replications were set as random effects. Furthermore, the historical weather data at each location within each year were downloaded from the weather website (http://rp5.ru/Weather_in_the_world) and the change curve of the daily lowest temperature were depicted to illustrate the relationship between SR with the temperature.

Methods
Genotyping and genetic diversity analysis. Tissue samples were collected from the tender leaves of each single plant and freeze dried in liquid nitrogen. Genomic DNA was isolated using a modified CTAB method 55 and the concentration of DNA was determined by NanoDrop 2000c (Thermo Fisher Scientific Inc., Wilmington, Delaware, USA). After that, the DNA solution was diluted into working solution with concentration about 20-30 ng/ul, which will be used in further experiments.
A genome-wide scanning was conducted to these materials using 400 SSR markers developed by our laboratory and 267 polymorphic SSR markers were used in further analysis 56 . PCR amplification was performed in 10 ul reaction volume containing 50 ng genomic DNA, 1 ul 10*buffer, 0.2 ul dNTP (10 mmol L −1 each), 1 ul primer (2 umolL −1 each) and 0.4 U Taq DNA polymerase. PCR products were separated by 8% non-denaturing polyacrylamide gel electrophoresis (PAGE) and visualized by 0.1% silver nitrate staining.
Genotype of each accession was digital coded according to the amplified fragment length of PCR products and genetic diversity for each locus including the number of alleles, gene diversity, polymorphic information content (PIC) were calculated using PowerMarker 3.25 57 .
Population genetic structure. Presence of population genetic structure can result in spurious associations between markers and phenotype due to the difference of allele frequency within subpopulations 58,59 . To solve this problem, Pritchard, et al. 40 developed a statistically valid method to estimate the details of population structure and integrate the information into association analysis. Therefore, inference of population genetic structure was performed using a Bayesian clustering analysis implemented in the software STRUCTRUE 2.3.3 40,60 to assign individuals to putative populations (k). The program was run with genotype data for k values from 1 to 10, with 20000 burn-in iterations followed by 100000 MCMC (Markov chain Monte Carlo) iterations for accurate parameter estimates. Ten independent runs for each k with correlated allele frequencies using an admixture model were performed to verify the consistency of the results. A statistic Delta K (▵K) following the procedure described by Evanno, et al. 61 was applied to infer the upper-most hierarchical level of the population structure. The Q matrix of posterior probability within the best-fit subpopulations generated by the population genetic structure analysis was used as covariates in further association analysis.
Cluster analysis was also performed using neighbor-joining (NJ) method based on the allele-sharing distance by PowerMarker 3.25 57  Association analysis. Association analysis between molecular markers and various phenotypic datasets of frost tolerance was conducted using TASSEL 3.01 62 . Markers with more than 20% missing data were removed and 259 markers were used for further association analysis. Firstly, a general linear model (GLM) in presence of population genetic structure were performed with Q matrix derived from structure analysis as covariates (GLM + Q). Secondly, a mixed linear model (MLM) incorporating a finer scale relative kinship matrix (K) and population genetic structure (Q) was applied to do the association analysis (MLM + Q + K), which improves statistical power compared to "Q" only 42 . In the MLM model, population structure and marker effect were treated as fixed effect, while individuals and residual were fit as random effects. Kinship coefficient was calculated with the distance estimation based on Loiselle, et al. 63 implemented in the software SPAGeDi 64 . To control the family-wise error rate (FWER), a strict threshold of 2 × 10 −4 at a significance level of 0.05 was chosen after Bonferroni correction. Results of marker-trait association were illustrated with a Manhattan plot ( Figure 4).