Genome-Wide Association Study of Ureide Concentration in Diverse Maturity Group IV Soybean [Glycine max (L.) Merr.] Accessions

Ureides are the N-rich products of N-fixation that are transported from soybean nodules to the shoot. Ureides are known to accumulate in leaves in response to water-deficit stress, and this has been used to identify genotypes with reduced N-fixation sensitivity to drought. Our objectives in this research were to determine shoot ureide concentrations in 374 Maturity Group IV soybean accessions and to identify genomic regions associated with shoot ureide concentration. The accessions were grown at two locations (Columbia, MO, and Stuttgart, AR) in 2 yr (2009 and 2010) and characterized for ureide concentration at beginning flowering to full bloom. Average shoot ureide concentrations across all four environments (two locations and two years) and 374 accessions ranged from 12.4 to 33.1 µmol g−1 and were comparable to previously reported values. SNP–ureide associations within and across the four environments were assessed using 33,957 SNPs with a MAF ≥0.03. In total, 53 putative loci on 18 chromosomes were identified as associated with ureide concentration. Two of the putative loci were located near previously reported QTL associated with ureide concentration and 30 loci were located near genes associated with ureide metabolism. The remaining putative loci were not near chromosomal regions previously associated with shoot ureide concentration and may mark new genes involved in ureide metabolism. Ultimately, confirmation of these putative loci will provide new sources of variation for use in soybean breeding programs.

2014), sudden death syndrome (SDS) (Wen et al. 2014), and more recently published reports identifying genomic regions related to carbon isotope discrimination (Dhanapal et al. 2015a) and N concentration, N derived from atmosphere, and C:N ratio (Dhanapal et al. 2015b). Further, three qualitative traits (flower, hilum, and pubescence color) and three quantitative traits (maturity, plant height, and seed weight) were recently analyzed by GWAS . It is likely that the number of soybean GWAS reports will substantially increase with the recent release of the SoySNP50K (Illumina iSelect SNP Bead-Chip) data for more than 19,000 accessions of the USDA-ARS Soybean Germplasm collection (Song et al. 2013).
Within statistical parameters, GWAS can identify genomic regions (SNP alleles) associated with the traits of interest. However, it is expected that these regions will require independent validation. Nonetheless, reducing the genome to fewer regions of interest for further study is a significant accomplishment. In a recent GWAS study on soybean protein and oil (Hwang et al. 2014), most previously reported quantitative trait loci (QTL) for these traits were identified and the genomic regions in which the QTL were identified was narrowed. Results such as these provide a measure of confidence in GWAS findings in soybean and the potential of GWAS as a research tool to identify genomic loci for traits of interest.
Ureides (allantoin and allantoate) are the N-rich products of N-fixation that are transported from soybean nodules to the shoot. Considerable research has focused on the metabolism of ureides, especially with regard to drought. Ureides have long been thought to play a role in the sensitivity of N-fixation to drought that may involve a feedback inhibition resulting from accumulation of ureides in leaves and nodules during water-deficit stress (Sinclair and Serraj 1995;Serraj and Sinclair 1996;de Silva et al. 1996;Gordon et al. 1997;Purcell et al. 1998;Serraj et al. 1999Serraj et al. , 2001Vadez et al. 2000;King and Purcell 2005;Ladrera et al. 2007). Coleto et al. (2014) concluded that although ureide accumulation was a general stress-related response and not the cause or signal of N-fixation inhibition in common bean (Phaseolus vulgaris L.), there was a greater concentration of ureides in shoots of drought-sensitive genotypes than droughttolerant genotypes, which is similar to results for soybean (King and Purcell 2005;King et al. 2014). Watanabe et al. (2014) suggested a possible regulatory action for allantoin in which it influences abscisic acid production, thereby affecting stress tolerance.
Although the exact role of shoot ureide accumulation in the downregulation of N-fixation has not been elucidated, a preponderance of evidence indicates that ureides may be useful in identifying genotypes that are able to continue N-fixation at relatively low soil moisture content King and Purcell 2005;King et al. 2014). For example, Sinclair et al. (2000) used a low petiole ureide concentration as a preliminary screen to identify genotypes that might be more drought-tolerant from approximately 3000 soybean accessions. Genotypes with the lowest 10% petiole ureide concentration were selected for more selective screens, which ultimately identified eight accessions with drought-tolerant N-fixation. Our objectives in this research were to measure shoot ureide concentration in a large group of soybean accessions [374 Maturity Group (MG) IV accessions] and conduct GWAS to identify genomic regions associated with shoot ureide concentration.

Field experiments
Field experiments were conducted at two locations (Columbia, Missouri and Stuttgart, Arkansas) over 2 yr (2009 and 2010). In Columbia, the experiments were conducted at the Bradford Research and Extension Center (38°539N, 92°129 W) and in Stuttgart, they were conducted at the Rice Research Experiment Station (34°309 N, 91°339 W). For the purpose of analysis and discussion, each year and location was treated as a separate environment and designated as C09, C10, S09, and S10 for Columbia and Stuttgart in 2009 and 2010. Experimental details were described by Dhanapal et al. (2015a). The soil texture at both locations was a silt loam [in Columbia, a Mexico silt loam (fine, smectitic, mesic Aeric Vertic Epiaqualf) and in Stuttgart, a Crowley silt loam (fine, montmorillonitic, thermic Typic Albaqualfs)] and the fields were tilled prior to sowing. In both years, sowing occurred earlier in Columbia (May 23, 2009and May 27, 2010) than in Stuttgart (June 2, 2009and June 10, 2010. In both years, four-row plots were used in Columbia and single-row plots were used in Stuttgart. Fertilization was based on soil tests and followed the recommendations of the University of Missouri (http://aes.missouri.edu/pfcs/soiltest.pdf) and the University of Arkansas (http://www.uaex.edu/publications/pdf/mp197/chapter5. pdf). Furrow irrigation was applied to experiments S09 and S10 as needed and experiments C09 and C10 were grown without irrigation (i.e., rainfed).

Experimental design
As reported by Dhanapal et al. (2015a,b), 385 soybean [Glycine max (L.) Merr.] MG IV accessions were sown in a randomized complete block design with three replications in all four environments (two locations and 2 yr; C09, C10, S09, S10). The accessions evaluated were obtained from the USDA-ARS Germplasm collection based on GRIN (Germplasm Resources Information Network, www.arsgrin.gov) data and with the assistance of the collection curator, Dr. Randall Nelson. The selected accessions were all MG IV genotypes with seed yield .1.7 Mg ha 21 and good agronomic traits (height, lodging, shattering, etc.). Because SNP data on the germplasm collection were not yet available at the time when entries were selected, genetic diversity was estimated by considering country and province of origin in proportion to the number of entries from that source in the germplasm collection. For the ureide trait described in this analysis, data on 374 of the 385 accessions are reported and analyzed.

Ureide sampling and analysis
The above-ground portions of five plants chosen at random from each plot were harvested between beginning bloom (R1) to full bloom (R2) [stages according to Fehr et al. 1971)]. In Columbia, sampling took place 53 d after planting (DAP) in both years, and in Stuttgart they were conducted 50 DAP in 2009 and 61 DAP in 2010. Harvested plants were dried in an oven at 60°until completely dry. The five-plant samples were then ground in a Wiley Mill (Thomas Model 4 Wiley Mill; Thomas Scientific, NJ, USA) to pass through a 2-mm screen, mixed, and then a subsample was ground a second time using a UDY Cyclone sample mill with a 1-mm screen (MODEL 3010-014; UDY Corporation, CO, USA). Ureides were extracted by placing 0.125 g of the ground shoot material in a test tube with 5 ml of 0.2 M NaOH. After placing test tubes in a water bath at 100°for 30 min, a 1-mL aliquot was transferred to a 1.5-mL microfuge tube and centrifuged at 20,000 · g for 5 min. Fifty to 100 mL of the supernatant was analyzed for ureides using the colorimetric procedure of Young and Conway (1942).

Analyses
In each environment (C09, C10, S09, and S10), the experimental design was a randomized complete block. Ureide values were log-transformed to equalize variance among environments. ANOVA was conducted on the log-transformed values within and across environments using a general linear mixed model with environments and accessions treated as random effects. Additionally, accession mean ureide concentrations were obtained using BLUP (best linear unbiased prediction) predictors by Proc GLIMMIX of SAS (ver. 9.3; SAS Institute, Cary, NC). Ureide phenotypic means are shown in Supporting Information, File S1. SNP data on the 374 accessions evaluated in this study were obtained from the SoySNP50K iSelect SNP BeadChip (Song et al. 2013) now curated on SoyBase (www.soybase.org). Brief instructions for obtaining the SNP data are shown in File S2. For the 374 accessions, 33,957 SNPs had a minor allele frequency (MAF) of $3%, which was the threshold for inclusion in the analysis reported herein. This threshold was chosen to facilitate the identification of rare genotypes. BLUP predictors of ureide accession means derived for each individual environment, and also across all environments, were used for genome-wide association analysis. Linkage disequilibrium (LD) was calculated using all SNPs with a MAF $3% among the 374 soybean accessions and distributed over the 20 soybean chromosomes. Calculation of pairwise LD (r 2 ) among SNPs was based on SNPs within a 1-Mb window using PLINK (Purcell et al. 2007) software. Separate LD calculations were performed for euchromatic and heterochromatic chromosomal regions. JMP Genomics 7.1 (SAS Institute, Cary, NC) was used to perform the genome-wide association analysis and to generate covariate matrices to account for population structure (Q-matrix) and genetic relatedness (K-matrix). First, the K-matrix was generated using allele sharing similarity, and from this the Q-matrix with eight dimensions (Dhanapal et al. 2015b) was generated using multidimensional scaling (Kruskal and Wish 1978;SAS Institute 2008) to identify grouping patterns. However, the K-matrix used in the genome-wide association analysis was generated using identity-by-descent. Because the Q-matrix is derived from the K-matrix, part of the Q/K relationship is strictly due to this computation. Using both measures provides a more conservative accounting of population structure. The Null Model Likelihood Ratio Test (SAS ver. 9.4; SAS Institute, Cary, NC) indicated that the Q-K model significantly (P # 0.005) improved the description of variance between genotypes in all environments as compared to a model without adjustment for genetic relatedness. Both matrices were generated using all 33,957 SNPs with a MAF $3%. These matrices were used with the Q-K Mixed model procedure (PROC GLIMMIX) to test for association between ureide concentration and SNP while simultaneously adjusting for population structure and genetic relatedness (Yu et al. 2006). The model used fixed effects for SNP and each element in the Q-matrix as a covariate and random effects for each element in the K-matrix as a covariate.
SNP-trait associations were conducted on ureide concentration within individual environments as well as for the average ureide concentration across all four environments. Within environments and for the average across environments, the threshold for declaring a significant association was set to P # 0.0001, which is comparable or more stringent than that reported in other soybean GWAS studies (Hao et al. 2012;Hwang et al. 2014;Mamidi et al. 2014;Zhang et al. 2015).
To take advantage of the four independent environments utilized in this study, the within-environment results were further analyzed by considering the joint probability for all possible two-environment and three-environment combinations. By definition (Mendenhall and Scheaffer 1973), the joint probability of being wrong both times is the probability of falsely rejecting the test of significant SNP effect in one environment · the probability of falsely rejecting the test of significant SNP effect in the other environment. Joint probabilities were calculated by multiplying the respective P values of each SNP in all possible two-environment and three-environment combinations. A multiple testing adjustment (FDR) (Benjamini and Hochberg 1995) at a threshold of P # 0.01 was applied across all SNPs and joint probabilities collectively using the "P-Value Adjustment" of JMP Genomics v7.1. By-environment probabilities, joint probabilities, and adjusted joint probabilities are shown in File S3. Six SNPs identified in these analyses that showed conflicting results were eliminated as likely false-positive results.

Data availability
File S1 contains the ureide means of the 374 plant introductions for the four environments. File S2 describes how to obtain the SNP data from www.soybase.org. File S3 contains the by environment probabilities, 2-environment joint probabilities and FDR adjusted 2-environment joint probabilities for all 33,397 SNPs tested for significant associations with ureide content. Table S1 shows the SNP information for all SNPs identified as significantly associated with ureide concentration.

Environment and accessions
Although environmental conditions were generally more similar between years in Columbia and in Stuttgart than between the locations, considerable variation in environmental conditions was observed among all four environments ( Figure 1). Solar radiation generally was greater in Stuttgart than in Columbia for both years and generally greater in 2010 than in 2009 for Stuttgart ( Figure 1A). The relative maximum and minimum temperatures (Figure 1, B and C) indicated that Stuttgart in 2010 was the warmest environment and Columbia 2009 was the coolest environment. Overall, Columbia received more rainfall than did Stuttgart ( Figure 1D), but the experiments in Stuttgart were supplemented with irrigation as needed (data not shown).
Ureide concentration data were obtained on a total of 374 accessions in each of the four environments. Overall, the accessions represented 11 different national sources and a total of at least 37 different provinces within those countries (Table 1). Based on population structure analyses for these accessions and the SNP data set, Dhanapal et al. (2015a,b) previously determined that they could be grouped into eight subpopulations.
Using all SNPs with a MAF $0.03 and all 374 genotypes, LD was separately calculated for euchromatic and heterochromatic chromosomal regions. In euchromatic regions the mean LD (r 2 ) declined to 0.2 within approximately 185 kbp, which is approximately half that reported by Hwang et al. (2014) and Zhang et al. (2015) for a similar number of SNPs but with different and fewer soybean accessions. However, it was very similar to that reported by Dhanapal et al. (2015a) for the same set of soybean accessions used herein, but with fewer SNPs. As reported by others (Hwang et al. 2014;Zhang et al. 2015), the LD was very different in euchromatic regions than in heterochromatic regions. In this study, LD in the heterochromatic regions did not decay to half of the maximum value within 1 Mb, which was similar to that reported by Dhanapal et al. (2015a).
The average shoot ureide concentrations across all four environments and 374 accessions ranged from 12.4 to 33.1 mmol g 21 with a minimum to maximum range from 7.4 to 50.5 mmol g 21 (Table 2; Figure 2A). These values were comparable to previously reported values for a set of 96 recombinant inbred lines that ranged from 18.6 to 39.0 mmol g 21 across 4 yr experimentation with a minimum to maximum range of 9.8 to 64.0 mmol g 21 (Hwang et al. 2013). In both years, the average ureide concentration was higher in Columbia than in Stuttgart (overall 167% higher in 2009 and 34% higher in 2010) ( Table 2). The range of n ureide concentrations among the accessions was also wider in Columbia for both years compared to those measured in Stuttgart ( Figure 2A). By treating accessions (i.e., genotypes) as random effects, ANOVA showed that the variability due to accessions (i.e., heritability) was 33%, 32%, 23%, and 38% for the C09, C10, S09, and S10 environments, respectively. Combined across environments, the total variability was 32% (14% for accessions plus 18% for environment · accession interactions). In all cases, the random effects were significant based on the 95% CIs of the variance component estimates. ANOVA across environments and treating accessions as fixed effects revealed significant differences among accessions (F = 4.24; P , 0.0001) and a significant accession · environment interaction (F = 2.17; P , 0.0001). Regression of accession means between locations within each year indicated little correspondence (r 2 = 0.12 and P , 0.0001 in 2009 and r 2 = 0.02 and P = 0.0058 in 2010). Better correspondence between years was observed for the Columbia (r 2 = 0.29, P , 0.0001) than the Stuttgart (r 2 = 0.07, P , 0.0001) location.
For each environment, the 374 accessions were ranked from lowest to highest ureide concentration and then the average ranking was generated across all four environments. Table 3 shows the 20 accessions (approximately 5% of the total number of accessions) with the lowest average ranking and the 20 accessions with the highest average ranking along with the ureide concentration in each environment. PI 507424 had the lowest average ranking and PI 424292 had the highest average ranking (Table 3). Overall, the average ranking indicates that these 40 accessions were more consistent in their respective category (high or low ureide concentration) and they likely represent the most consistent extremes for ureide concentration among the 374 accessions evaluated. The 20 accessions with the lowest average rank for ureide concentration across environments were from Japan (seven accessions), China (six), South Korea (four), North Korea (two), and Georgia (one) (Table 3). However, for the 20 accessions with the highest average rank, 16 were from South Korea, three were from Japan, and one was from China (Table 3).

SNP-ureide associations
Potential marker associations with ureide concentration were evaluated by comparing the BLUP mean Log(ureide) concentrations of the two homozygous marker alleles for each of the 33,957 SNP markers with a minor allele frequency $0.03 across all 374 accessions. Log values were used to equalize variances among environments and BLUP means were used to help reduce the effect of extreme values (see Figure 2). For each marker, data were analyzed independently within each of the four environments as well as for the overall mean across all four environments. To help control false-positive associations, analyses were conducted with adjustments for population structure (Q-matrix) and genetic relatedness (K-matrix) (Yu et al. 2006;Zhu et al. 2008). On average the Q-and K-matrix adjustments reduced the number of significant associations detected by approximately 68% at P = 0.10 up to approximately 99% at P # 0.0001 across environments and for the overall mean (data not shown).
Adjusting for population structure and genetic relatedness and using a stringent probability threshold of P # 0.0001 identified 40 SNPs with significant associations with ureide concentration in at least one of the four environments (8, 14, 14, and 4 SNPs for C09, C10, S09, and S10, respectively) as well as 15 significant SNP associations with the overall mean. Of these SNPs, seven were significant both in one environment and for the overall mean. Thus, a total of 48 unique SNPs were identified as significant (P # 0.0001) in at least one environment, with the overall mean, or both. A list of these SNPs and their details are provided in Table S1.
The above results considered associations within each individual environment and for the mean ureide concentration across the four environments. To further take advantage of the multiple environments used in this study, we also calculated the joint probability (Mendenhall and Scheaffer 1973) of SNP-trait associations in all two-environment or three-environment combinations and collectively applied an adjustment for multiple testing (Benjamini and Hochberg 1995) threshold of P # 0.01. No SNPs in any of the three-environment combinations met the P # 0.01 threshold. However, 141 SNPs met the threshold in at least one of the two-environment combinations. A list of these SNPs and their details are provided in Table S1.
All 15 of the SNPs associated with the mean over all four environments and all but five of the 40 SNPs identified in at least one individual environment were also found to be significantly associated by the joint probability analysis across environments. Thus, a total of 146 (141+5) unique SNPs were identified in individual environments, by the mean overall environments, or by considering two-environment combinations. The relative genomic locations of these 146 SNPs are shown in Figure 3. Considering that closely spaced SNPs likely identify the same locus, these 146 SNPs comprise 53 putative loci (Table 4; Figure 3). Table 4 presents a summary of the SNP information at each putative locus. For those putative loci identified by multiple SNPs, one representative SNP is shown in Table 4, but information for all 146 individual SNPs is shown in Table S1. The number of SNPs tagging each putative locus ranged from 1 to 26 (locus 21, Table 4), with nearly half (24) of the putative loci being identified by multiple SNPs.
Of the 53 total putative loci identified, 19 were identified by at least one SNP with a significant association in one or more individual environments (by environment, Table 4; Figure 3). Ten putative loci were identified by one or more SNPs associated with the mean over all environments (by mean, Table 4; Figure 3). Four putative loci (loci 1, 26, 32, and 48, Table 4; Figure 3) were identified by SNPs with significant associations both in individual environments and with the mean over all environments. All but two of the 53 putative loci (loci 23 and 46; n   Figure 3) were identified by at least one two-environment combination in the joint probability analysis. Both of the loci not identified in the joint probability analysis were each identified by a single SNP in one environment (Table 4). Considering both signifi-cant associations in individual environments and the joint probability over environments, 16 of the putative loci had three of the four environments contributing a significant SNP association, and for seven putative loci, all four environments contributed (Table 4). The average effect of all SNPs comprising a given locus is shown as the average percent change in ureide concentration between the mean of those genotypes with the major allele and those with the minor allele (Table 4). Thus, those loci with a negative percent change indicate that the genotypes with the minor allele had a greater ureide concentration than those with the major allele. The average percent change within a putative locus ranged from 222.8 to 18.2%. Twenty-six of the 53 putative loci were associated with an increase in ureide concentration for those genotypes having the minor allele. For those putative loci tagged by multiple SNPs, the response (negative or positive) was the same across all SNPs comprising that locus. Values for each significant SNP are shown in Table S1.
Two of the 53 putative loci identified based on our stringent criteria were near ureide QTL (on chromosomes 13 and 19) previously identified by Hwang et al. (2013) in a biparental mapping population (Figure 3). However, application of a lower threshold in a single environment or with the overall mean revealed at least one significant SNP near all eight QTL identified by Hwang et al. (2013). This illustrates that more stringent criteria, although providing greater confidence in the identified SNP-ureide associations, may not detect other "real" associations revealed using less stringent criteria. Nonetheless, while the use of more stringent thresholds may eliminate some real SNP-trait associations, we have greater confidence that those SNPs that meet the more stringent thresholds warrant in-depth evaluation.
In addition to reported ureide QTL, a search was conducted in SoyBase for genes that might be related to ureide metabolism and that were located within 3 Mbp of the 53 putative loci shown in Figure 3. This search revealed 38 likely ureide-related genes that are located near 30 of n the putative loci identified in this study (Figure 3; Table 5). The genes identified were directly involved in the synthesis of ureides (i.e., uricase), the catabolism of ureides (i.e., allantoate and ureidoglycolate amidohydrolases), or in a biochemical pathway related to ureide metabolism (i.e., nucleotidases, etc.).

DISCUSSION
Even though a relatively large number of SNPs (33,957 SNPs; MAF $3%) were evaluated in this study, gaps of various lengths in the coverage of almost every chromosome (particularly chromosomes 1, 5, 11, 12, and 20) are visible in Figure 3. Many of these gaps are near centromere locations as reported in SoyBase and shown in Figure 3. Chromosomal regions around centromeres have long been known to have less recombination and greater heterochromatic DNA (Slatis 1955;Haupt et al. 2001;Westphal and Reuter 2002;Talbert and Henikoff 2010). The lack of SNP variability in these genomic regions is not surprising.
To help control false positives we used BLUP means (to reduce the effect of extreme values) and accounted for both population structure (Q-matrix) and genetic relatedness (K-matrix) (Yu et al. 2006;Zhu et al. 2008;Dhanapal et al. 2015a,b). We also applied high thresholds for reporting significant associations with ureide concentration in each environment or with the overall mean (P # 0.0001). Additionally, we examined associations over multiple environments using joint proba-bilities adjusted for multiple testing (Benjamini and Hochberg 1995) at a threshold of P # 0.01. In total, we identified 53 putative loci ( Figure 3) associated with ureide concentration. Of these, 29 were identified by a single SNP (Table 4). Lowering the stringency levels in the analysis would likely identify other SNP associations near these loci but may also increase the number of false-positive associations detected. All but two of the 53 putative loci were identified in more than one environment (loci 23 and 46, Table 4). Of those identified in multiple environments, 28 were identified using data from two environments, 16 were identified from three environments, and seven were identified from all four environments. Loci with significant SNP-trait associations over multiple independent environments may indicate that the associated genes are more stably expressed (i.e., less environmental influence). The stringent conditions used in the analysis provides confidence that these loci warrant more detailed investigation.
For 26 of the 53 putative loci, the minor allele was associated with an increase in ureide concentration (negative values in Table 4). Of the five loci with the largest increases in ureide concentration associated with a minor allele, three were near two different putative hydroxyisourate hydrolase genes (Table 5). One was at locus 43 (chromosome 16, 222.8%; Table 4) and the other two were at loci 9 and 10 (both on chromosome 3, Table 4). The two loci on chromosome 3 are near each other and had the same effect on ureide concentration (216.7%; Table  4). Potentially these loci are not independent. At least one n    b Glycine max chromosome number.
c Location of the SNP on the chromosome in bp.
d Number of significant SNP associations identifying the putative locus.
e Range in bp over which the SNPs identifying the putative locus were located.
f Minor allele frequency (MAF) averaged over all significant SNP associations for the respective locus.
g Number of significant SNP associations in one or more environments at the P # 0.0001 level for the respective locus.
h Number of significant SNP associations with the mean across environments at the P # 0.0001 level for the respective locus.
i Number of different two-environment combinations with at least one significant SNP at the respective locus.
j Number of different environments for which at least one of the SNPs tagging the putative locus was significant. k The average effect was calculated as the percent change in ureide concentration from back-transformed differences in ureide concentrations (major to minor allele) averaged over all significant SNPs tagging a locus.
hydroxyisourate hydrolase gene has been shown to play a role in ureide metabolism (Raychaudhuri and Tipton 2002). A more thorough examination of these two putative hydroxyisourate hydrolase genes in the accessions with the minor allele associations may provide previously unknown genetic variation associated with ureide concentrations in soybean. Interestingly, the locus with the second largest increase in ureide concentration (locus 12, 217.8%, Table 4) associated with a minor allele was also on chromosome 3. However, it is likely far enough away from the other putative loci on chromosome 3 to be independent. Two of the four loci with the largest increases associated with elevated ureide concentration for the major allele were located on chromosome 10 (loci 24 and 26; 18.2% and 14.3%, respectively, Table 4 and Figure 3). Both loci were tagged by three SNPs each but significant associations were detected in all four environments (Table 4). No putative ureiderelated gene was identified near locus 24 and an ureidoglycolate amidohyrolase (discussed below) was near the other. The other two loci with the largest effect (loci 6 and 35; Table 4) were located near different putative adenosine deaminase genes (Table 5). Interestingly, one of these (locus 35, Table 4; and see chromosome 13 in Figure 3) was also near one of the putative QTL identified by Hwang et al. (2013). Both of these loci were tagged by multiple SNPs (four and nine SNPs, respectively, Table 4) and significant associations were detected in all four environments (Table 4). Adenosine deaminases are involved in purine metabolism; however, their full role in plants is not well understood. In fact, Dancer et al. (1997) concluded that plants do not contain adenosine deaminase, although others have reported low levels (Edwards 1996). The large effect of these two putative loci on ureide concentration and their location near putative adenosine deaminase genes may provide a path for research to investigate and more fully understand the role of adenosine deaminases in plants. Werner et al. (2013) examined two gene copies of allantoate amidohydrolase (GmAAH1 and GmAAH2), ureidoglycine aminohydrolase n Table 5 Ureide-related genes identified near the putative loci shown in Figure 3 and (GmUGlyAH1 and GmUGlyAH2), and ureidoglycolate amidohydrolase (GmUAH1 and GmUAH2), which are all involved in ureide hydrolysis (Werner et al. 2013). Three of these genes, GmUAH1 (Glyma10g32850, chromosome 10), GmAAH2 (Glyma15g16870, chromosome 15), and GmUAH2 (Glyma20g34790, chromosome 20), were each near a different locus of the 53 putative loci (see Table 5 and Figure  3). For GmUAH2 (chromosome 20), one nearby significant SNP association was detected (Table 4, locus 53), while three (Table 4, locus 26) and two (Table 4, locus 40) nearby significant SNP associations marked the loci near GmUAH1 (chromosome 10) and GmAAH2 (chromosome 15), respectively. For all three loci, the major allele was associated with an increase in ureide concentration (Table 4) and, thus, the minor allele was associated with a decrease in ureide concentration. Potentially, the reduced ureide concentration in the accessions with the minor alleles near these genes might be associated with more active/efficient ureide metabolism. Reduced petiole ureide concentrations have been associated with increased drought tolerant N-fixation, possibly through the elimination/reduction of feedback inhibition caused by a buildup of ureides (Sinclair and Serraj 1995;Serraj and Sinclair 1996;de Silva et al. 1996;Gordon et al. 1997;Purcell et al. 1998;Serraj et al. 1999Serraj et al. , 2001Vadez et al. 2000;King and Purcell 2005;Ladrera et al. 2007). Thus, if these three loci are associated with greater ureide catabolism as a result of more active/efficient alleles of these hydrolases, then the accessions with the minor allele may also exhibit greater drought-tolerant Nfixation. While no SNPs evaluated in this study were located between the start and stop positions of GmUAH1, GmAAH2, and GmUAH2, comparisons of the sequences of these genes between genotypes with the major and minor alleles for the nearby significant SNP could be of interest. Additionally, confirmation of tolerance and more detailed investigation of these accessions may aid in the identification of the molecular mechanisms associated with drought-tolerant N-fixation. Duran and Todd (2012) identified four allantoinase (E.C. 3.5.2.5) genes that they designated GmALN1, GmALN2, GmALN3, and GmALN4 (corresponding to Glyma15g07910, Glyma13g31430, Glyma15g07920, and Glyma13g31420). GmALN1 and GmALN3 are on chromosome 15 and GmALN2 and GmALN4 are on chromosome 13. On both chromosomes, the two respective genes are very closely spaced (within approximately 11,000 bp). None of the 53 putative loci identified using the stringent conditions were near the location of these genes on either chromosome. However, at lower stringency (P # 0.01), significant SNP associations were detected within 0.4 MB (chromosome 13) and 0.6 MB (chromosome 15) for both pairs of genes. Thus, the identification of these SNPs associated with well-characterized ureide-related genes again indicates that very stringent criteria may mask true associations. This emphasizes the necessity for approaches that balance the need to identify true associations with the need to eliminate false positives, because at lower stringencies many more likely false-positive SNP associations can be identified. Without independent information (i.e., Duran and Todd 2012;Hwang et al. 2013) or further confirmation, false positives at lower stringencies are especially problematic.
Other putative loci ( Figure 3) were not near any gene annotated as ureide related in SoyBase. This may represent a lack of knowledge about the function of genes in the region or, even with the high stringencies used, some of the putative loci may represent false positives. Nonetheless, even though many of the other loci are corroborated by previously identified QTL or annotated genes, these loci also warrant further investigation.
In this study, 53 putative loci associated with ureide concentration were identified. Two of the putative loci were located near previously reported QTL associated with ureide concentration and 30 loci were located near genes associated with ureide metabolism. Potentially, these results indicate variation in known genes that require further investigation. The remaining loci may represent new genes affecting ureide concentration (biosynthesis, transport, degradation, etc.) and also warrant further in-depth investigation. Confirmation of these loci could be accomplished through quantifying segregation in appropriately constructed biparental mapping populations. Further investigations such as expression analyses and sequencing of important known genes (i.e., specific uricase and amidohydrolase genes) near the loci identified in the accessions comprising the minor SNP frequency may reveal novel insights into the regulation of ureide synthesis or catabolism. Ultimately, confirmation of the putative loci identified in this study will provide new sources of variation for use in breeding programs developing improved soybean cultivars.