Multivariate Analysis of the Cotton Seed Ionome Reveals a Shared Genetic Architecture

To mitigate the effects of heat and drought stress, a better understanding of the genetic control of physiological responses to these environmental conditions is needed. To this end, we evaluated an upland cotton (Gossypium hirsutum L.) mapping population under water-limited and well-watered conditions in a hot, arid environment. The elemental concentrations (ionome) of seed samples from the population were profiled in addition to those of soil samples taken from throughout the field site to better model environmental variation. The elements profiled in seeds exhibited moderate to high heritabilities, as well as strong phenotypic and genotypic correlations between elements that were not altered by the imposed irrigation regimes. Quantitative trait loci (QTL) mapping results from a Bayesian classification method identified multiple genomic regions where QTL for individual elements colocalized, suggesting that genetic control of the ionome is highly interrelated. To more fully explore this genetic architecture, multivariate QTL mapping was implemented among groups of biochemically related elements. This analysis revealed both additional and pleiotropic QTL responsible for coordinated control of phenotypic variation for elemental accumulation. Machine learning algorithms that utilized only ionomic data predicted the irrigation regime under which genotypes were evaluated with very high accuracy. Taken together, these results demonstrate the extent to which the seed ionome is genetically interrelated and predictive of plant physiological responses to adverse environmental conditions.

File S1-S4 File S1. Best linear unbiased predictors (BLUPs) for soil elemental concentrations within the experimental field site. The Microsoft Excel file contains the BLUPs from the mixed linear model fitted for the soil samples taken at five depths from each of the neutron probe access sites in years 2010 and 2012. The Universal Transverse Mercator (UTM, based on North American Datum 1983) coordinates, X UTM and Y UTM positions, are provided for each of the neutron probe access sites. Elemental concentrations are reported in parts per million (ppm).
File S2. Best linear unbiased estimators (BLUEs) for seed elemental concentrations. The Microsoft Excel file contains the BLUEs from the fitted mixed linear model for both the overall and by-year BLUEs calculated for the 95 recombinant inbred lines (RILs) and parents of the TM-1×NM24016 mapping population evaluated under water-limited (WL) and well-watered (WW) conditions. Parental lines were excluded for the purposes of quantitative trait loci (QTL) mapping. Seed elemental concentrations are reported as parts per billion (ppb).
File S3. Marker genotype data for the TM-1×NM24016 mapping population. The Microsoft Excel file contains the marker genotype scores for the 95 recombinant inbred lines (RILs) from the TM-1×NM24016 mapping population that were used in this work. The first three columns provide the genetic linkage map information from Gore, M. A. et al. 2014. The Plant Genome 7:1-10. Genotype marker data for the RILs are reported in columns D through CT, with a header labeled as NMXX where XX denotes the unique numeric identifier for each RIL that corresponds with the BLUEs in File S2. Marker data are coded for ICIM v. 4.0 with 2, 1, and 0 designating the homozygous parent 1 (TM-1), heterozygote, and homozygous parent 2 (NM24016) genotypic states, respectively, and -1 representing missing genotypic data. Genetic distances are reported in centiMorgans (cM).
File S4. Integration of the TM-1×NM24016 genetic linkage map with the G. hirsutum L. acc. TM-1 draft genome sequence. The Microsoft Excel file contains the genetic linkage map information from Gore, M. A. et al. 2014. The Plant Genome 7:1-10, and the results from aligning marker context sequences to the G. hirsutum L. acc. TM-1 draft genome sequence (NBI assembly v1.1, Zhang, T. et al. 2015. Nature Biotechnology 33:531-537). A complete description of the alignment process, including software used, is described in Pauli, D. et al. 2016a. G3 6:865-879. The assignment of linkage groups to the draft genome sequence do not represent definitive placement of markers with respect to physical position on the TM-1 genome. .20**** NS Not Significant at the < 0.05 level. * Significant at the < 0.05 level. ** Significant at the < 0.01 level. *** Significant at the < 0.001 level. **** Significant at the < 0.0001 level. Table 2. Geostatistical model parameters used for interpolation of soil element concentrations (kriging) across the experimental field at the Maricopa Agricultural Center in Maricopa, AZ, using the estimated best linear unbiased predictors (BLUPs) from a fitted linear mixed model. "Variance model" is the covariance structure used to account for spatial relationships of sampled data points. "Nugget" is the measurement error plus the variation that occurs over distances less than the shortest sampling interval. "Range" is the distance (m) at which spatial dependencies are no longer present, i.e., sampling points are spatially independent, and sill is the maximum variance at which this point occurs. SS Error, sums of squares error for the fitted model; NA, not applicable. 8.34**** 12.02** 8.34**** Zn 9.06**** 19.99** 1.23 NS NS Not Significant at the < 0.05 level. * Significant at the < 0.05 level. ** Significant at the < 0.01 level. *** Significant at the < 0.001 level. **** Significant at the < 0.0001 level. Table 5. Phenotypic (̂) correlations with standard errors in parenthesis between the 14 elements evaluated in the TM-1×NM24016 recombinant inbred line (RIL) mapping population evaluated under contrasting irrigation regimes, water-limited (WL, values above the diagonal) and well-watered (WW, below the diagonal). Field trials were conducted from 2010-12 at the Maricopa Agricultural Center located in Maricopa, AZ.

Mg
Cu   LG, linkage group. d. Bayes factor, converted posterior probability indicating likelihood of QTL presence. e. Beta, fitted regression coefficient for respective marker in model. f. SD, standard deviation of the regression coefficient, beta, from fitted model. Table 8. Summary of significant quantitative trait loci (QTL) identified using the multi-trait analysis approach of seemingly unrelated regression (SUR) to map two independent groupings of elements in the TM-1×NM24016 recombinant inbred line (RIL) population evaluated under contrasting irrigation regimes, water-limited (WL) and well-watered (WW) conditions, at the Maricopa Agricultural Center, Maricopa, AZ. Select elements were combined into biologically relevant groupings based on biochemical function as outlined in Taiz and Zeiger (2006) and Mengel and Kirkby (2012). The "ionic" group consists of calcium, potassium, magnesium, and manganese, whereas the "redox" group is composed of iron, zinc, copper, nickel, and molybdenum. The P-values associated with the respective marker is significant at a Bonferroni-corrected threshold of α = 0.05. Marker positions are reported as centimorgans (cM). LG, linkage group. Table 9. Comparison of marker loci that were below the critical threshold in the Bayesian classification method (Bayes factor of 100), but were detected using the multi-trait mapping method of seemingly unrelated regression (SUR) in two irrigation regimes, waterlimited (WL) and well-watered (WW). Tables A and B display results associated with the multielement grouping of ionic (Ca, K, Mg, and Mn) and redox (Cu, Fe, Mo, Ni, and Zn) in the multitrait analysis. The P-value associated with the significant marker identified in the multi-trait analysis is significant at a Bonferroni-corrected α = 0.05. LG, linkage group. d. SUR P-value, P-value from the multi-trait analysis using seemingly unrelated regression.